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Preface 


Linear algebra is one of the core topics studied at university level 
by students on many different types of degree programme. Alongside 
calculus, it provides the framework for mathematical modelling in many 
diverse areas. This text sets out to introduce and explain linear algebra 
to students from any discipline. It covers all the material that would 
be expected to be in most first-year university courses in the subject, 
together with some more advanced material that would normally be 
taught later. 

The book has drawn on our extensive experience over a number of 
years in teaching first- and second-year linear algebra to LSE under- 
graduates and in providing self-study material for students studying at 
a distance. This text represents our best effort at distilling from our 
experience what it is that we think works best in helping students not 
only to do linear algebra, but to understand it. We regard understand- 
ing as essential. ‘Understanding’ is not some fanciful intangible, to be 
dismissed because it does not constitute a “demonstrable learning out- 
come’: it is at the heart of what higher education (rather than merely 
more education) is about. Linear algebra is a coherent, and beauti- 
ful, part of mathematics: manipulation of matrices and vectors leads, 
with a dash of abstraction, to the underlying concepts of vector spaces 
and linear transformations, in which contexts the more mechanical, 
manipulative, aspects of the subject make sense. It is worth striving for 
understanding, not only because of the inherent intellectual satisfaction, 
but because it pays off in other ways: it helps a student to work with the 
methods and techniques because he or she knows why these work and 
what they mean. 

Large parts of the material in this book have been adapted and devel- 
oped from lecture notes prepared by MH for the Mathematical Methods 
course at the LSE, a long-established course which has a large audience, 
and which has evolved over many years. Other parts have been influ- 
enced by MA’s teaching of non-specialist first-year courses and second- 
year linear algebra. Both of us have written self-study materials for 
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students; some of the book is based on material originally produced 
by us for the programmes in economics, management, finance and the 
social sciences by distance and flexible learning offered by the Univer- 
sity of London International Programmes (www.londoninternational. 
ac.uk). 

We have attempted to write a user-friendly, fairly interactive and 
helpful text, and we intend that it could be useful not only as a course 
text, but for self-study. To this end, we have written in what we hope is an 
open and accessible — sometimes even conversational — style, and have 
included ‘learning outcomes’ and many ‘activities’ and ‘exercises’. We 
have also provided a very short introduction just to indicate some of the 
background which a reader should, ideally, possess (though if some of 
that is lacking, it can easily be acquired in passing). 

Reading a mathematics book properly cannot be a passive activity: 
the reader should interrogate the text and have pen and paper at the ready 
to check things. To help in this, the chapters contain many activities — 
prompts to a reader to be an ‘active’ reader, to pause for thought and 
really make sure they understand what has just been written, or to think 
ahead and anticipate what is to come next. At the end of chapters, there 
are comments on most of the activities, which a reader can consult to 
confirm his or her understanding. 

The main text of each chapter ends with a brief list of ‘learning 
outcomes’. These are intended to highlight the main aspects of the 
chapter, to help a reader review and consolidate what has been read. 

There are carefully designed exercises towards the end of each 
chapter, with full solutions (not just brief answers) provided at the end 
of the book. These exercises vary in difficulty from the routine to the 
more challenging, and they are one of the key ingredients in helping a 
reader check his or her understanding of the material. Of course, these 
are best made use of by attempting them seriously before consulting the 
solution. (It’s all very easy to read and agree with a solution, but unless 
you have truly grappled with the exercise, the benefits of doing so will 
be limited.) 

We also provide sets of additional exercises at the end of each 
chapter, which we call Problems as the solutions are not given. We hope 
they will be useful for assignments by teachers using this book, who will 
be able to obtain solutions from the book’s webpage. Students will gain 
confidence by tackling, and solving, these problems, and will be able to 
check many of their answers using the techniques given in the chapter. 

Over the years, many people — students and colleagues — have 
influenced and informed the way we approach the teaching of linear 
algebra, and we thank them all. 
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Preliminaries: before we 
begin 


This short introductory chapter discusses some very basic aspects of 
mathematics and mathematical notation that it would be useful to be 
comfortable with before proceeding. We imagine that you have studied 
most (if not all) of these topics in previous mathematics courses and 
that nearly all of the material is revision, but don’t worry if a topic is 
new to you. We will mention the main results which you will need to 
know. If you are unfamiliar with a topic, or if you find any of the topics 
difficult, then you should look up that topic in any basic mathematics 
text. 


Sets and set notation 


A set may be thought of as a collection of objects. A set is usually 
described by listing or describing its members inside curly brackets. 
For example, when we write A = {1, 2,3}, we mean that the objects 
belonging to the set A are the numbers 1, 2, 3 (or, equivalently, the set 
A consists of the numbers 1, 2 and 3). Equally (and this is what we 
mean by ‘describing’ its members), this set could have been written 
as 


A = {n | n isa whole number and 1 < n < 3}. 


Here, the symbol | stands for ‘such that’. (Sometimes, the symbol ‘:’ is 
used instead.) As another example, the set 


B = {x | x is a reader of this book} 
has as its members all of you (and nothing else). When x is an object 


in a set A, we write x € A and say ‘x belongs to A’ or ‘x is a member 
of A’. 
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The set which has no members is called the empty set and is denoted 
by Ø. The empty set may seem like a strange concept, but it has its uses. 

We say that the set S is a subset of the set T, and we write S C T, 
or S C T, if every member of S is a member of T. For example, 
{1, 2,5} C {1, 2,4, 5, 6, 40}. The difference between the two symbols 
is that S C T means that S is a proper subset of T, meaning not all 
of T, and S C T means that S is a subset of T and possibly (but not 
necessarily) all of 7. So in the example just given we could have also 
written {1, 2,5} c {1, 2,4, 5, 6, 40}. 

Given two sets A and B, the union A U B is the set whose members 
belong to A or B (or both A and B); that is, 


AUB={x |x eAorxe B}. 


For example, if A = {1,2,3,5} and B = {2, 4, 5,7}, then Æ U B = 
{1, 2,3,4,5, 7}. 

Similarly, we define the intersection A N B to be the set whose 
members belong to both A and B: 


ANB={x |x €Aandx €B}. 
So, if A = {1, 2,3, 5} and B = {2, 4, 5, 7}, then AM B = {2, 5}. 


Numbers 


There are some standard notations for important sets of numbers. The 
set IR of real numbers, the ‘normal’ numbers you are familiar with, 
may be thought of as the points on a line. Each such number can be 
described by a decimal representation. 

The set of real numbers R includes the following subsets: N, the set 
of natural numbers, N = {1, 2,3, ... }, also referred to as the positive 
integers; Z, the set of all integers, {..., —3, —2, —1, 0, 1, 2,3, ...}; and 
Q, the set of rational numbers, which are numbers that can be written as 
fractions, p/q, with p,q € Z, q # 0. In addition to the real numbers, 
there is the set C of complex numbers. You may have seen these before, 
but don’t worry if you have not; we cover the basics at the start of 
Chapter 13, when we need them. 

The absolute value of a real number a is defined by 


fa ifa > 0 
A= an 


So the absolute value ofa equals a ifa is non-negative (that is, ifa > 0), 
and equals —a otherwise. For instance, |6| = 6 and | — 2.5] = 2.5. Note 
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that 
va? = jaj, 


since by y/x we always mean the non-negative square root to avoid 
ambiguity. So the two solutions of the equation x? = 4 are x = +2 
(meaning x = 2 or x = —2), but V4 = 2. 

The absolute value of real numbers satisfies the following 
inequality: 


a+b| < |a| + Ibl, a,beR. 


Having defined R, we can define the set R? of ordered pairs (x, y) of 
real numbers. Thus, R? is the set usually depicted as the set of points in 
a plane, x and y being the coordinates of a point with respect to a pair 
of axes. For instance, (—1, 3/2) is an element of R? lying to the left of 
and above (0, 0), which is known as the origin. 


Mathematical terminology 


In this book, as in most mathematics texts, we use the words ‘definition’, 
‘theorem’ and ‘proof’, and it is important not to be daunted by this 
language if it is unusual to you. A definition is simply a precise statement 
of what a particular idea or concept means. Definitions are hugely 
important in mathematics, because it is a precise subject. A theorem is 
just a statement or result. A proof is an explanation as to why a theorem 
is true. As a fairly trivial example, consider the following: 


Definition: An integer n is even if it is a multiple of 2; that is, ifn = 2k 
for some integer k. 


Note that this is a precise statement telling us what the word ‘even’ 
means. It is not to be taken as a ‘result’: it’s defining what the word 
‘even’ means. 


Theorem: The sum of two even integers is even. That is, ifm,n are 
even, so is m +n. 


Proof. Suppose m, n are even. Then, by the definition, there are integers 
k, l such that m = 2k and n = 21. Then 


m+n = 2k +21 = 2(k +1). 


Since k + / is an integer, it follows that m + n is even. 
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Note that, as here, we often use the symbol LI to denote the end of 
a proof. This is just to make it clear where the proof ends and the 
following text begins. 

Occasionally, we use the term ‘corollary’. A corollary is simply a 
result that is a consequence of a theorem and perhaps isn’t ‘big’ enough 
to be called a theorem in its own right. 

Don’t worry about this terminology if you haven’t met it before. It 
will become familiar as you work through the book. 


Basic algebra 
Algebraic manipulation 


You should be capable of manipulating simple algebraic expressions 
and equations. 
You should be proficient in: 


e collecting up terms; for example, 2a + 3b — a + 5b = a + 8b 
e multiplication of variables; for example, 


a(—b) — 3ab + (—2a)(—4b) = —ab — 3ab + 8ab = 4ab 
e expansion of bracketed terms; for example, 
—(a — 2b) = —a + 2b, 
(2x — 3y)(x + 4y) = 2x? — 3xy + 8xy — 12y? 
= 2x? + 5xy — 12y’. 


Powers 


When n is a positive integer, the nth power of the number a, denoted 
a", is simply the product of n copies of a; that is, 


a" =axaxax-::Xa. 
n times 
The number n is called the power, exponent or index. We have the power 
rules (or rules of exponents), 
d'a =da™, (dY =a", 
whenever r and s are positive integers. 
The power a? is defined to be 1. 
The definition is extended to negative integers as follows. When n 
is a positive integer, a~” means 1 /a”. For example, 37? is 1/37 = 1/9. 
The power rules hold when r and s are any integers, positive, negative 
or zero. 
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When n is a positive integer, a'/” is the positive nth root of a; this 


is the positive number x such that x” = a. For example, a!/? is usually 
denoted by ~va, and is the positive square root of a, so that 41/2 = 2. 

When m and n are integers and n is positive, a’”/” is (a'/")". This 
extends the definition of powers to the rational numbers (numbers which 
can be written as fractions). The definition is extended to real numbers 
by ‘filling in the gaps’ between the rational numbers, and it can be 
shown that the rules of exponents still apply. 


Quadratic equations 


It is straightforward to find the solution of a linear equation, one of 
the form ax + b = 0 where a,b € R. By a solution, we mean a real 
number x for which the equation is true. 

A common problem is to find the set of solutions of a quadratic 
equation 


ax? +bx +c =0, 


where we may as well assume that a Æ 0, because ifa = 0 the equation 
reduces to a linear one. In some cases, the quadratic expression can 
be factorised, which means that it can be written as the product of two 
linear terms. For example, 


x? — 6x +5 = (x — DG — 5), 


so the equation x? — 6x + 5 = 0 becomes (x — 1)(x — 5) = 0. Now, 
the only way that two numbers can multiply to give 0 is if at least one 
of the numbers is 0, so we can conclude that x — 1 = 0 or x — 5 = 0; 
that is, the equation has two solutions, 1 and 5. 

Although factorisation may be difficult, there is a general method for 
determining the solutions to a quadratic equation using the quadratic 
formula, as follows. Suppose we have the quadratic equation ax? + 
bx + c = 0, where a Æ 0. Then the solutions of this equation are 


—b — Vb? — 4ac —b + vyb? — 4ac 
X1 = X2? = . 
2a 2a 


The term b? — 4ac is called the discriminant. 


e Ifb* —4ac > 0, the equation has two real solutions as given above. 

e If b?—4ac =0, the equation has exactly one solution, x = 
—b/(2a). (In this case, we say that this is a solution of multiplicity 
two.) 
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e If b? —4ac < 0, the equation has no real solutions. (It will have 
complex solutions, but we explain this in Chapter 13.) 


For example, consider the equation 2x? — 7x +3 =0. Using the 
quadratic formula, we have 
—b + vyb? — 4ac 7+./49—4(2)3) 75 
a = = . 
2a 2(2) 4 


So the solutions are x = 3 andx = Z. 


The equation x? + 6x + 9 = 0 has one solution of multiplicity 2; its 
discriminant is b? — 4ac = 36 — 9(4) = 0. This equation is most easily 
solved by recognising that x? + 6x + 9 = (x +3), so the solution is 
x= -3. 

On the other hand, consider the quadratic equation 


x* —2x +3 =0; 


here we have a = 1, b = —2, c = 3. The quantity b? — 4ac is negative, 
so this equation has no real solutions. This is less mysterious than it 
may seem. We can write the equation as (x — 1)? + 2 = 0. Rewriting 
the left-hand side of the equation in this form is known as completing 
the square. Now, the square of a number is always greater than or equal 
to 0, so the quantity on the left of this equation is always at least 2 and 
is therefore never equal to 0. The quadratic formula for the solutions to 
a quadratic equation is obtained using the technique of completing the 
square. Quadratic polynomials which cannot be written as a product of 
linear terms (so ones for which the discriminant is negative) are said to 
be irreducible. 


Polynomial equations 


A polynomial of degree n in x is an expression of the form 
P (x) = ag + aix + ax? +--+ + anx”, 


where the a; are real constants, a, #0, and x is a real variable. For 
example, a quadratic expression such as those discussed above is a 
polynomial of degree 2. 

A polynomial equation of degree n has at most n solutions. For 
example, since 


x? — 7x +6 = (x — 1)\(x — 2)(x + 3), 


the equation x? — 7x + 6 = 0 has three solutions; namely, 1, 2, —3. 
The solutions of the equation P,(x) = 0 are called the roots or zeros 
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of the polynomial. Unfortunately, there is no general straightforward 
formula (as there is for quadratics) for the solutions to P,(x) = 0 for 
polynomials P„ of degree larger than 2. 

To find the solutions to P(x) = 0, where P is a polynomial of degree 
n, we use the fact that if œ is such that P(~) = 0, then (x — œ) must 
be a factor of P(x). We find such an a by trial and error and then write 
P(x) in the form (x — a)Q(x), where Q(x) is a polynomial of degree 
n-l. 

As an example, we’ll use this method to factorise the cubic poly- 
nomial x? — 7x + 6. Note that if this polynomial can be expressed as a 
product of linear factors, then it will be of the form 


x? — 7x +6 = (x = ri)(x — r2)\(x — r3), 


where its constant term is the product of the roots: 6 = —rırzr3. (To 
see this, just substitute x = 0 into both sides of the above equation.) So 
if there is an integer root, it will be a factor of 6. We will try x = 1. 
Substituting this value for x, we do indeed get 1 — 7 + 6 = 0, so (x — 1) 
is a factor. Then we can deduce that 


x? — 7x +6 = (x — 1)(x? + Ax — 6) 


for some number å, as the coefficient of x? must be 1 for the product to 
give x°, and the constant term must be —6 so that (—1)(—6) = 6, the 
constant term in the cubic. It only remains to find A. This is accomplished 
by comparing the coefficients of either x? or x in the cubic polynomial 
and the product. The coefficient of x? in the cubic is 0, and in the product 
the coefficient of x? is obtained from the terms (—1)(x7) + (x)(Ax), so 
that we must have A — 1 = 0 orA = 1. Then 


x? — 7x +6 = (x — 1x? +x — 6), 
and the quadratic term is easily factorised into (x — 2)(x + 3); that is, 


x? — 7x +6 = (x — 1)(x — 2)(x + 3). 


Trigonometry 


The trigonometrical functions, sin@ and cos@ (the sine function and 
cosine function), are very important in mathematics. You should know 
their geometrical meaning. (In a right-angled triangle, sin 0 is the ratio 
of the length of the side opposite the angle @ to the length of the 
hypotenuse, the longest side of the triangle; and cos 0 is the ratio of the 
length of the side adjacent to the angle to the length of the hypotenuse.) 
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It is important to realise that throughout this book angles are mea- 
sured in radians rather than degrees. The conversion is as follows: 180 
degrees equals x radians, where z is the number 3.141 .... It is good 
practice not to expand z or multiples of z as decimals, but to leave them 
in terms of the symbol x. For example, since 60 degrees is one-third of 
180 degrees, it follows that in radians 60 degrees is 7/3. 

The sine and cosine functions are related by the fact that 
cosx = sin(x + >), and they always take a value between 1 and —1. 
Table 1 gives some important values of the trigonometrical functions. 

There are some useful results about the trigonometrical functions, 
which we use now and again. In particular, for any angles 6 and ¢, we 
have 


sin? 6 + cos? 0 = 1, 
sin(6 + $) = sin 0 cos¢ + cos@ sing 


and 


cos(@ + @) = cos@cos¢@ — sin 0 sing. 


Table 1 

0 sin 0 cos 0 
0 0 1 

1/6 1/2 J3/2 
/4 1/V2 1//2 
1/3 3/2 1/2 
x/2 1 0 


A little bit of logic 


It is very important to understand the formal meaning of the word ‘if’ 
in mathematics. The word is often used rather sloppily in everyday life, 
but has a very precise mathematical meaning. Let’s give an example. 
Suppose someone tells you ‘If it rains, then I wear a raincoat’, and 
suppose that this is a true statement. Well, then suppose it rains. You 
can certainly conclude the person will wear a raincoat. But what if it 
does not rain? Well, you can’t conclude anything. The statement only 
tells you about what happens if it rains. If it does not, then the person 
might, or might not, wear a raincoat. You have to be clear about this: 
an ‘if—then’ statement only tells you about what follows if something 
particular happens. 
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More formally, suppose P and Q are mathematical statements (each 
of which can therefore be either true or false). Then we can form the 
statement denoted P => Q (‘P implies Q’ or, equivalently, ‘if P, then 
Q’), which means ‘if P is true, then Q is true’. For instance, consider 
the theorem we used as an example earlier. This says that if m,n are 
even integers, then so is m + n. We can write this as 


m,n even integers => m + n is even. 


The converse of a statement P = > Q is Q => P and whether that 
is true or not is a separate matter. For instance, the converse of the 
statement just made is 


m +n is even => m,n even integers. 


This is false. For instance, 1 + 3 is even, but 1 and 3 are not. 

If, however, both statements P = > Q and Q => P are true, then 
we say that Q is true if and only if P is. Alternatively, we say that P 
and Q are equivalent. We use the single piece of notation P => Q 
instead of the two separate P => O and Q => P. 
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Matrices and vectors will be the central objects in our study of linear 
algebra. In this chapter, we introduce matrices, study their properties 
and learn how to manipulate them. This will lead us to a study of vectors, 
which can be thought of as a certain type of matrix, but which can more 
usefully be viewed geometrically and applied with great effect to the 
study of lines and planes. 


1.1 What is a matrix? 


Definition 1.1 (Matrix) A matrix is a rectangular array of numbers or 
symbols. It can be written as 


ai] 412 Qin 

a2) a22 Q2n 
A= : 

Ami Am2 +++ Amn 


We denote this array by the single letter A or by (a;;), and we say that 
A has m rows and n columns, or that itis an m x n matrix. We also say 
that A is a matrix of size m x n. 

The number a;; in the ith row and jth column is called the (i, j) 
entry. Note that the first subscript on a;; always refers to the row and 
the second subscript to the column. 


Example 1.2 The matrix 
2 1 7 8 
A= (o —2 5 -l ) 
4 9 3 0 
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is a 3 x 4 matrix whose entries are integers. For this matrix, a23 = 5, 
since this is the entry in the second row and third column. 


Activity 1.3 In Example 1.2 above, what is a32? 


A square matrix is an n x n matrix; that is, a matrix with the same 
number of rows as columns. The diagonal of a square matrix is the list 
of entries a11, 422, . . . , Ann. 

A diagonal matrix is a square matrix with all the entries which are 
not on the diagonal equal to 0. So A is diagonal ifitisn x n anda;; = 0 
ifi Æ j. Then A looks as follows: 


ay 0 ore 0 
0 a * °° 0 
0 0 “++ Ann 


Activity 1.4 Which of these matrices are diagonal? 


—3 0 0 0 0 0 
(0 21); (o a o); a 
0 0 1 0 0 2 


Definition 1.5 (Equality) Two matrices are equal if they are the same 
size and if corresponding entries are equal. That is, if A = (a;;) and 
B = (b;;) are both m x n matrices, then 


A=B _ 4> aj =bij l<i<m,l<jx<n. 


1.2 Matrix addition and scalar multiplication 


If A and B are two matrices, then provided they are the same size we 
can add them together to form a new matrix A + B. We define A + B 
to be the matrix whose entries are the sums of the corresponding entries 
in A and B. 


Definition 1.6 (Addition) If 4 = (a;;) and B = (b;;) are both m x n 
matrices, then 


We can also multiply any matrix by a real number, referred to as a scalar 
in this context. If à is a scalar and A is a matrix, then 1A is the matrix 
whose entries are À times each of the entries of A. 
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Definition 1.7 (Scalar multiplication) If A = (a;;)isanm x n matrix 
and à € R, then 


Example 1.8 
3 1 2 aq? 1 4 22 6 
A+B=() 5 a =. Ee 2 =) 
3 1 2 —6 -2 —4 
-24=-2(5 5 er —10 fa) 
1.3 Matrix multiplication 


Is there a way to multiply two matrices together? The answer is some- 
times, depending on the sizes of the matrices. If A and B are matrices 
such that the number of columns of A is equal to the number of rows 
of B, then we can define a matrix C which is the product of A and 
B. We do this by saying what the entry c;; of the product matrix 4B 
should be. 


Definition 1.9 (Matrix multiplication) If A is an m x n matrix and 
B is ann x p matrix, then the product is the matrix 4B = C = (cij) 
with 


Cij = ajiby; + ajzb>; foes + dinbnj- 


Although this formula looks daunting, it is quite easy to use in practice. 
What it says is that the element in row 7 and column j of the product 
is obtained by taking each entry of row i of A and multiplying it by the 
corresponding entry of column j of B, then adding these n products 
together. 


bij 
l b2j 
row i of A —> | điī din +++) Gin ; 


bnj 


column j of B 


What size is C = AB? The matrix C must be m x p since it will have 
one entry for each of the m rows of A and each of the p columns of B. 
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Example 1.10 In the following product, the element in row 2 and 
column 1 of the product matrix (indicated in bold type) is found, as 
described above, by using the row and column printed in bold type. 


bl “A 3 4 
3 0 

20 1 5 3 

Boe DA [1 1) = 1 14 

22 -1 9 -1 


This entry is 5 because 


(2)3) + 00) + 0-1) = 5. 


Notice the sizes of the three matrices. A is 4 x 3, B is 3 x 2, and the 
product AB is 4 x 2. 


We shall see in later chapters that this definition of matrix multiplication 
is exactly what is needed for applying matrices in our study of linear 
algebra. 

It is an important consequence of this definition that: 


« AB BA in general. That is, matrix multiplication is not ‘commu- 
tative’. 


To see just how non-commutative matrix multiplication is, let’s look at 
some examples, starting with the two matrices A and B in the example 
above. The product AB is defined, but the product BA is not even 
defined. Since A is 4 x 3 and B is 3 x 2, it is not possible to multiply 
the matrices in the order B A. 

Now consider the matrices 


3 1 
T l a and B=|1 0]. 
22 D | 11 


Both products AB and BA are defined, but they are different sizes, so 
they cannot be equal. What sizes are they? 


Activity 1.11 Answer the question just posed concerning the sizes of 
AB and BA. Multiply the matrices to find the two product matrices, 
AB and BA. 


Even if both products are defined and the same size, it is still generally 
true that AB Æ BA. 


Activity 1.12 Investigate this last claim. Write down two different 
2 x 2 matrices A and B and find the products 4B and B A. For example, 
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you could use 


1.4 Matrix algebra 


Matrices are useful because they provide a compact notation and we 
can perform algebra with them. 
For example, given a matrix equation such as 


4298 =9(8 = 440), 


we can solve this for the matrix C using the rules of algebra. You must 
always bear in mind that to perform the operations they must be defined. 
In this equation, it is understood that all the matrices A, B and C are 
the same size, say m x n. 

We list the rules of algebra satisfied by the operations of addition, 
scalar multiplication and matrix multiplication. The sizes of the matrices 
are dictated by the operations being defined. The first rule is that addition 
is ‘commutative’: 


© A+B=B+A4. 


This is easily shown to be true. The matrices A and B must be of the 
same size, say m x n, for the operation to be defined, so both 4 + B 
and B + A are m x n matrices for some m and n. They also have the 
same entries. The (7, j) entry of A + B is aij + bij and the (i, j) entry 
of B + Ais bj; + aij, but aij + bj; = bij + aj; by the properties of real 
numbers. So the matrices A + B and B + A are equal. 

On the other hand, as we have seen, matrix multiplication is not 
commutative: AB Æ BA in general. 

We have the following ‘associative’ laws: 


© (A+B)+C=A+4(B+O), 
© AB) =(A4)B = A(AB), 
© (AB)C = A(BC). 


These rules allow us to remove brackets. For example, the last rule says 
that we will get the same result if we first multiply AB and then multiply 
by C on the right as we will if we first multiply BC and then multiply 
by A on the left, so the choice is ours. 

We can show that all these rules follow from the definitions of the 
operations, just as we showed the commutativity of addition. We need 
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to know that the matrices on the left and on the right of the equals sign 
have the same size and that corresponding entries are equal. Only the 
associativity of multiplication presents any complications, but you just 
need to carefully write down the (i, j) entry of each side and show that, 
by rearranging terms, they are equal. 


Activity 1.13 Think about these rules. What sizes are each of the 
matrices? Write down the (i, j) entry for each of the matrices A( AB) 
and (à 4)(B) and prove that the matrices are equal. 


Similarly, we have three ‘distributive’ laws: 


+ A(B+C)=AB+AC, 
© (B+C)A=BA+CA, 
© MA+B)=AA+AB. 


Why do we need both of the first two rules (which state that matrix 
multiplication distributes through addition)? Well, since matrix multi- 
plication is not commutative, we cannot conclude the second distributive 
rule from the first; we have to prove it is true separately. These state- 
ments can be proved from the definitions of the operations, as above, 
but we will not take the time to do this here. If A is an m x n matrix, 
what is the result of Æ — A? We obtain an m x n matrix all of whose 
entries are 0. This is an ‘additive identity’; that is, it plays the same 
role for matrices as the number 0 does for numbers, in the sense that 
A+0=0+A = A. There is a zero matrix of any size m x n. 


Definition 1.14 (Zero matrix) A zero matrix, denoted 0, is an m x n 
matrix with all entries zero: 


00 >.. 0 0 
00 >.. 0 0 
00 >.>. 0 0 

Then: 

«e A+0=A4, 

e A-A=0O), 


e 0A=0, AD=0, 


where the sizes of the zero matrices above must be compatible with the 
size of the matrix A. 

We also have a ‘multiplicative identity’, which acts like the number 1 
does for multiplication of numbers. 
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Definition 1.15 (Identity matrix) The n x n identity matrix, denoted 


I, or simply J, is the diagonal matrix with a; = 1, 
10. 0 
01- 0 
ta e a 
00. 1 


If A is any m x n matrix, then: 
e AI=AandIA =A, 


where it is understood that the identity matrices are the appropriate size 
for the products to be defined. 


Activity 1.16 What size is the identity matrix if A is m x n and 
IA=4? 


Example 1.17 We can apply these rules to solve the equation, 
3A +2B = 2(B — A + C) for C. We will pedantically apply each rule 
so that you can see how it is being used. In practice, you don’t need to 
put in all these steps, just implicitly use the rules of algebra. We begin 
by removing the brackets using the distributive rule. 


34 +2B =2B—2A+42C (distributive rule) 


34 +2B —2B (add —2B to both sides) 
= 2B—2A+2C -2B 
3A + (2B — 2B) (commutativity, associativity 


= —2Á + 2C + (2B — 2B) of addition) 
34 +0 =—24+2C +0 (additive inverse) 


3A = —2A +2C (additive identity) 

34 +24 =-—24A+2C+2A (add 24 to both sides) 

54 =2C (commutativity, associativity of 
addition, additive identity) 

C= ŠA (scalar multiplication). 


1.5 Matrix inverses 
1.5.1 The inverse of a matrix 


If AB = AC, can we conclude that B = C? The answer is ‘no’, as the 
following example shows. 
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Example 1.18 If 


0 0 1 —-l 
a= (i a B=(3 ae i 


then the matrices B and C are not equal, but 


0 0 
aB=4c=(; Ap 


II 
25N 
| 

p 
A O 
NS 


Activity 1.19 Check this by multiplying out the matrices. 


On the other hand, if A + 5B = A+ 5C, then we can conclude that 
B=C because the operations of addition and scalar multiplication 
have inverses. If we have a matrix A, then the matrix —A = (—1)A is 
an additive inverse because it satisfies Æ + (—A) = 0. If we multiply a 
matrix A by a non-zero scalar c, we can ‘undo’ this by multiplying cA 
by 1/c. 

What about matrix multiplication? Is there a multiplicative inverse? 
The answer is ‘sometimes’. 


Definition 1.20 (Inverse matrix) The n x n matrix A is invertible if 
there is a matrix B such that 


AB = BA =], 


where I is the n x n identity matrix. The matrix B is called the inverse 
of A and is denoted by A7!. 


Notice that the matrix A must be square, and that both J and B = A7! 
must also be square n x n matrices, for the products to be defined. 


Example 1.21 Let A = G J Then with 


we have AB = BA = I, B = A™!. 


Activity 1.22 Check this. Multiply the matrices to show that AB = I 
and BA = I, where J is the 2 x 2 identity matrix. 


You might have noticed that we have said that B is the inverse of A. 
This is because an invertible matrix has only one inverse. We will prove 
this. 


Theorem 1.23 If A is ann x n invertible matrix, then the matrix A`! 
is unique. 
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Proof. Assume the matrix A has two inverses, B and C, so that 
AB = BA = I and AC = CA = I. We will show that B and C must 
actually be the same matrix; that is, that they are equal. Consider the 
product CAB. Since matrix multiplication is associative and AB = J, 
we have 


CAB = C(A4B)=CI =C. 
On the other hand, again by associativity, 
CAB =(C4A)B =IB=B 


since CA = I. We conclude that C = B, so there is only one inverse 
matrix of A. 


Not all square matrices will have an inverse. We say that A is invertible 
or non-singular if it has an inverse. We say that A is non-invertible or 
singular if it has no inverse. 

0 0 

1 1 


For example, the matrix 
(used in Example 1.18 of this section) is not invertible. It is not possible 
for a matrix to satisfy 


0 0 a b\ (1 0 
1 1 c d} \O 1 
since the (1,1) entry of the product is 0 and 0 Æ 1. 
On the other hand, if 


a b 
ia ae where ad — bc 40, 


then A has the inverse 


gi. 1 É 7) 
~ ad—bce\-c aj’ 


Activity 1.24 Check that this is indeed the inverse of 4, by showing 
that if you multiply A on the left or on the right by this matrix, then you 
obtain the identity matrix 7. 


This tells us how to find the inverse of any 2 x 2 invertible matrix. If 


a b 
4=(¢ i) 


the scalar ad — bc is called the determinant of the matrix A, denoted 
|A|. We shall see more about the determinant in Chapter 3. So if 
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|A| = ad — bc # 0, then to construct A~! we take the matrix A, switch 
the main diagonal entries and put minus signs in front of the other two 
entries, then multiply by the scalar 1/| A]. 


Activity 1.25 Use this to find the inverse of the matrix € F and 
check your answer by looking at Example 1.21. 
If AB = AC, and A is invertible, can we conclude that B = C? This 


time the answer is ‘yes’, because we can multiply each side of the 
equation on the left by A7!: 


ATAB = AAC = IB=IC = B=C. 


But be careful! If 4B = CA, then we cannot conclude that B = C, 
only that B = A~'C A. 

It is not possible to ‘divide’ by a matrix. We can only multiply on 
the right or left by the inverse matrix. 


1.5.2 Properties of the inverse 


If A is an invertible matrix, then, by definition, A~! exists and 4AT! = 
AT!A = I. This statement also says that the matrix A is the inverse of 
A7!; that is, 

e (A7! = Á. 

It is important to understand the definition of an inverse matrix and 
be able to use it. Essentially, if we can find a matrix that satisfies the 


definition, then that matrix is the inverse, and the matrix is invertible. 
For example, if A is an invertible n x n matrix, then: 


e (AA)! = ly, 
À 


This statement says that the matrix àA is invertible, and its inverse 
is given by the matrix C =(1/A)A7!. To prove this is true, we just 
need to show that the matrix C satisfies (A4)C = C(A A) = J. This is 
straightforward using matrix algebra: 


I ed 1 “4 Me ca Lad 
(AA) (4 ) =}-AA =I and (<4 ) (AA) = -AA A=. 
À À À À 


If A and B are invertible n x n matrices, then using the definition of 
the inverse you can show the following important fact: 


© (ABY! = BIA. 
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This last statement says that if A and B are invertible matrices of the 
same size, then the product AB is invertible and its inverse is the product 
of the inverses in the reverse order. The proof of this statement is left 
as an exercise. (See Exercise 1.3.) 


1.6 Powers of a matrix 


If A is a square matrix, what do we mean by A”? We naturally mean the 
product of A with itself, A? = AA. In the same way, if A isann xn 
matrix andr € N, then: 


n 
r times 


A =AA...A. 
-_- 


Powers of matrices obey a number of rules, similar to powers of num- 
bers. First, if A is ann x n matrix andr € N, then: 


A (a) = (ATIY. 


This follows immediately from the definition of an inverse matrix and 
the associativity of matrix multiplication. Think about what it says: that 
the inverse of the product of A times itself times is the product of A7! 
times itself times. 

The usual rules of exponents hold: for integers r, s, 


` A’ AS = ATTS, 
š (AYS = A's. 


Asr and s are positive integers and matrix multiplication is associative, 


these properties are easily verified in the same way as they are with real 
numbers. 


Activity 1.26 Verify the above three properties. 


1.7 The transpose and symmetric matrices 
1.7.1 The transpose of a matrix 


If we interchange the rows and columns of a matrix, we obtain another 
matrix, known as its transpose. 
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Definition 1.27 (Transpose) The transpose of an m x n matrix 


a{1 a\2 Skee din 
d21 d2 PEAR dn 
A= (aij) = 5 s ‘ 
Aml Am2 ... Amn 
is the n x m matrix 
dil d21 oe Aml 
di2? an <... Am? 
T 
A= (aji) = . : 
din An ... Amn 


So, on forming the transpose of a matrix, row i of A becomes column 
i of At. 

1) <2. 
3 4 


1 
1 3 
4 ee Lie 
eu, 4 (5). 
3 
Notice that the diagonal entries of a square matrix do not move under 


the operation of taking the transpose, as a;; remains a;;. So if D is a 
diagonal matrix, then DT = D. 


Example 1.28 If 4 = ( ) and B=(1 5 3), then 


1.7.2 Properties of the transpose 


If we take the transpose of a matrix A by switching the rows and 
columns, and then take the transpose of the resulting matrix, then we 
get back to the original matrix A. This is summarised in the following 
equation: 


e (AD! = Á. 
Two further properties relate to scalar multiplication and addition: 


e (AA)! =)A! and 
e (A+B) =A! +B. 
These follow immediately from the definition. In particular, the (i, j) 
entry of (AA)! is Aaj, which is also the (i, j) entry of AAT. 

The next property tells you what happens when you take the trans- 
pose of a product of matrices: 


© (AB)! = BTAT, 
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This can be stated as: The transpose of the product of two matrices is 
the product of the transposes in the reverse order. 

Showing that this is true is slightly more complicated, since it 
involves matrix multiplication. It is more important to understand why 
the product of the transposes must be in the reverse order: the following 
activity explores this. 


Activity 1.29 If A is an m x n matrix and B is n x p, look at the 
sizes of the matrices (AB)', AT, BT. Show that only the product 
B' A! is always defined. Show also that its size is equal to the size 
of (AB)'. 


If A is an m x n matrix and B isn x p, then, from Activity 1.29, you 
know that (4B)! and BTA" are the same size. To prove that (AB)! = 
BTAT, you need to show that the (i, j) entries are equal. You can try 
this as follows. 


Activity 1.30 The (i, j) entry of (4B)! is the (j, i) entry of AB, which 
is obtained by taking row j of A and multiplying each term by the 
corresponding entry of column i of B. We can write this as 


((48)'),, = aj1by; + ajoby + +++ + ajnbin. 


Do the same for the (i, j) entry of BTAT and show that you obtain the 
same number. 


The final property in this section states that the inverse of the transpose 
of an invertible matrix is the transpose of the inverse; that is, if A is 
invertible, then: 


© (4721 =(47))1. 


This follows from the previous property and the definition of inverse. 
We have 


AA = (4AA) =I" =] 


and, in the same way, (A~!)' A! = I. Therefore, by the definition of the 
inverse of a matrix, (A~')' must be the inverse of AT. 


1.7.3. Symmetric matrices 


Definition 1.31 (Symmetric matrix) A matrix A is symmetric if it is 
equal to its transpose, A = A". 
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Only square matrices can be symmetric. If A is symmetric, then a;; = 
aji. That is, entries diagonally opposite to each other must be equal, or, 
in other words, the matrix is symmetric about its diagonal. 


Activity 1.32 Fill in the missing numbers if the matrix A is symmetric: 


1 4 1 
A= 2 = -7 | = A: 
5 3 


If D is a diagonal matrix, then dj; = 0 = d;; for all i Æ j. So all 
diagonal matrices are symmetric. 


1.8 Vectors in R” 
1.8.1 Vectors 


Ann x l matrix is a column vector, or simply a vector 


vi 
v2 
v=]. |, 
Un 
where each v; is a real number. The numbers v1, v2, ..., Un, are known 


as the components (or entries) of the vector v. 

We can also define a row vector to be a 1 x n matrix. 

In this text, when we simply use the term vector, we shall mean a 
column vector. 

In order to distinguish vectors from scalars, and to emphasise that 
they are vectors and not general matrices, we will write vectors in 
lowercase boldface type. (When writing by hand, vectors should be 
underlined to avoid confusion with scalars.) 

Addition and scalar multiplication are defined for vectors as for 
n x | matrices: 


vi + w1 rv 

v2 + w2 v2 
v+w= : Vv = . 

Un + Wh ÀV, 
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For a fixed positive integer n, the set of vectors (together with the 
operations of addition and scalar multiplication) form the set R”, usually 
called Euclidean n-space. 

We will often write a column vector as the transpose of a row vector. 
Although 


X1 
X2 
T 
x= i =(x1 X2 ++: Xa), 
Xn 
ill ll ite x = uae T with ti 
we will usually write x = (x1, x2, , Xn) , With commas separating 


the entries. A matrix does not have commas; however, we will use them 
in order to clearly distinguish the separate components of the vector. 

For vectors v1, V2,..., Vg in R” and scalars a, a2,..., a, in R, the 
vector 


vV = 1V +- + aV € R” 


is known as a linear combination of the vectors v1, ..., Vk. 

A zero vector, denoted 0, is a vector with all of its entries equal 
to 0. There is one zero vector in each space R”. As with matrices, 
this vector is an additive identity, meaning that for any vector v € R”, 
0 +v = v + 0 = v. Further, multiplying any vector v by the scalar zero 
results in the zero vector: 0v = 0. 

Although the matrix product of two vectors v and w in R” cannot 
be calculated, it is possible to form the matrix products v' w and vw!. 
The first is a 1 x 1 matrix, and the latter is an n x n matrix. 


1 4 
Activity 1.33 Calculate a'b and ab! for a= | 2], b= | —2 
3 1 


1.8.2 The inner product of two vectors 


Forv, w € R”, the 1 x 1 matrix v'wcan be identified with the real num- 
ber, or scalar, which is its unique entry. This turns out to be particularly 
useful, and is known as the inner product of v and w. 


Definition 1.34 (inner product) Given two vectors 


vi w] 

v2 w2 
v= > W= : , 

Un Wn 
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the inner product, denoted (v, w), is the real number given by 


vi w] 
U2 W2 

wm = ( aa ‘ ) ar ata bt ate 
Un Wn 


The inner product, (v, w), is also known as the scalar product of v and 
w, or as the dot product. In the latter case, it is denoted by v- w. 

The inner product of v and w is precisely the scalar quantity (that 
is, the number) given by 


viw=(v U2 = Un) | | = UW + vw +++ + nW, 
wn 
so that we can write 
(v, w) = vw. 
Example 1.35 Ifx = (1, 2,3)! and y = (2, —1, 1)', then 
(x,y) = 12) + 2(-D + 3(1) = 3. 
It is important to realise that the inner product is just a number, a scalar, 
not another vector or a matrix. 


The inner product on R” satisfies certain basic properties as shown 
in the next theorem. 


Theorem 1.36 The inner product 


(X,Y) = xX1y1 + X2y2 +--+ + Xan, x,y € R” 


a 


satisfies the following properties for all x, y, z € IR" and for alla € 


() (x,y) = (y, x), 
(ii) a(x, y) = (ax, y) = (x, ay), 
(iii) (x+y, z) = (x, z) + (y, Z), 
(iv) (x,x) > 0, and (x, x) = 0 if and only if x = 0. 


Proof. We have, by properties of real numbers 
(X, y) = XY + x2y2 + `++ + Xan 
= YX, + Yox2 +++ + YnXn = (Y, X), 


which proves (i). We leave the proofs of (ii) and (iii) as an exercise. For 
(iv), note that 


(x,x) =x? +23 +---4x? 
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is a sum of squares, so (x, x) > 0, and (x, x) = 0 if and only if each 
term x is equal to 0; that is, if and only if each x; = 0, so x is the zero 
vector, x = 0. 


Activity 1.37 Prove properties (ii) and (iii). Show, also, that these two 
properties are equivalent to the single property 


(ax + By, z) = a(x, z) + Bly, z). 


From the definitions, it is clear that it is not possible to combine vectors 
in different Euclidean spaces, either by addition or by taking the inner 
product. If v € R” and w € R”, with m Æ n, then these vectors live in 
different ‘worlds’, or, more precisely, in different ‘vector spaces’. 


1.8.3 Vectors and matrices 


If A is an m x n matrix, then the columns of A are vectors in R”. If 
x € R”, then the product Ax is an m x 1 matrix, so is also a vector in 
R”. There is a fundamental relationship between these vectors, which 
we present here as an example of matrix manipulation. We list it as a 
theorem so that we can refer back to it later. 


Theorem 1.38 Let A be anm x n matrix 


Qi} 412 +++ Ain 
a an An 
A = * ’ 
Ani Am2 ‘++ Amn 
and denote the columns of A by the column vectors €i, €2,..., Cn, SO 
that 
dli 
Qi . 
cj; = f n Sls „n 
dmi 
Then if x = (x1, X2, ..-, Xn)! is any vector in R”, 


AX = X101 + X202 +--+ + XnCn. 


The theorem states that the matrix product 4x, which is a vector in R”, 
can be expressed as a linear combination of the column vectors of A. 
Before you look at the proof, try to carry out the calculation yourself, to 
see how it works. Just write both the left-hand side and the right-hand 
side of the equality as a single m x 1 vector. 
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Proof: We have 


ai] a2 Gin X1 

a2) a22 An X2 
AX = ; 

Ami Am2 ... Amn Xn 


a\1xX1 + 412X72 ++ AinXn 
d21X1 + a22X2 free AnXn 


Am1X1 + Am2X2 + +++ + AmnXn 


4il a12 din 

a2) a22 dn 
= x1 k + x2 é + : F Xn m 

dm1 Am2 Amn 


= XC] + X202 +++ Xal. 


There are many useful ways to view this relationship, as we shall see in 
later chapters. 


1.9 Developing geometric insight 


Vectors have a broader use beyond that of being special types of matri- 
ces. It is possible that you have some previous knowledge of vectors; for 
example, in describing the displacement of an object from one point to 
another in R? or in R?. Before we continue our study of linear algebra, 
it is important to consolidate this background, for it provides valuable 
geometric insight into the definitions and uses of vectors in higher 
dimensions. Parts of the next section may be a review for you. 


1.9.1 Vectors in R? 


The set R can be represented as points along a horizontal line, called a 
real-number line. In order to represent pairs of real numbers, (a1, a2), 
we use a Cartesian plane, a plane with both a horizontal axis and a 
vertical axis, each axis being a copy of the real-number line, and we 
mark A = (a), a2) as a point in this plane. We associate this point with 
the vector a = (a1, az)', as representing a displacement from the origin 
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a2 + (a1, a2) 


(0, 0) da x 


(0, 0) Fh 


(the point (0, 0)) to the point A. In this context, a is the position vector 
of the point A. This displacement is illustrated by an arrow, or directed 
line segment, with the initial point at the origin and the terminal point 
at A, as shown in Figure 1.1. 

Even if a displacement does not begin at the origin, two displace- 
ments of the same length and the same direction are considered to be 
equal. So, for example, the two arrows in Figure 1.2 represent the same 
vector, v = (1, 2). 

If an object is displaced from a point, say (0, 0), the origin, to a 
point P by the displacement p, and then displaced from P to Q by the 
displacement v, then the total displacement is given by the vector from 
0 to Q, which is the position vector q. So we would expect vectors to 
satisfy q = p + v, both geometrically (in the sense of a displacement) 
and algebraically (by the definition of vector addition). This is certainly 
true in general, as illustrated in Figure 1.3. 
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Figure 1.1 A 
position vector, a 


Figure 1.2 
Displacement 
vectors, V 


Figure 1.3 If 
v = (vı, v2)', then 
qı = pı + vı and 
q2 = v2 + p2 


Figure 1.4 
p+v=v+p 


Figure 1.5 A 
right-angled triangle 
to determine the 
length of a vector 
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(0, 0) ay xv 


The order of displacements does not matter (nor does the order of 
vector addition), so q = v + p. For this reason, the addition of vectors 
is said to follow the parallelogram law. This is illustrated in Figure 1.4. 

From the equation q = p + v, we have v = q — p. This is the dis- 
placement from P to Q. To help you determine in which direction the 
vector v points, think of v = q — p as the vector which is added to the 
vector p in order to obtain the vector q. 

If v represents a displacement, then 2v must represent a displace- 
ment in the same direction, but twice as far, and —Vv represents an equal 
displacement in the opposite direction. This interpretation is compatible 
with the definition of scalar multiplication. 


Activity 1.39 Sketch the vector v = (1, 2)' in a coordinate system. 
Then sketch 2v and —v. Looking at the coordinates on your sketch, 
what are the components of 2v and —v? 


We have stated that a vector has both a length and a direction. Given 
a vector a = (aj, a2)", its length, denoted by |lal|, can be calculated 
using Pythagoras’ theorem applied to the right triangle shown in 
Figure 1.5. 

So the /ength of a is the scalar quantity 


lall = ya? + a2. 
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The length of a vector can be expressed in terms of the inner product, 
lall = v (a, a), 

simply because (a, a) = a? + a3. A unit vector is a vector of length 1. 

Example 1.40 If v = (1, 2)', then ||v|| = V1? + 2? = J. The vector 

u= (=. 2) is a unit vector in the same direction as v. 

Activity 1.41 Check this. Calculate the length of u. 


The direction of a vector is essentially given by the components of 
the vector. If we have two vectors a and b which are (non-zero) scalar 
multiples, say 


a=Ab, AER, (AO), 


then a and b are parallel. If à > 0, then a and b have the same direction. 
If 4 < 0, then we say that a and b have opposite directions. 

The zero vector, 0, has length 0 and has no direction. For any other 
vector, v Æ 0, there is one unit vector in the same direction as v, namely 


1 


u= —v. 
Iiv] 


Activity 1.42 Write down a unit vector, u, which is parallel to the 
vector a = (4, 3)". Then write down a vector, w, of length 2 which is 
in the opposite direction to a. 


1.9.2 Inner product 


The inner product in R? is closely linked with the geometrical concepts 
of length and angle. If a = (a1, a2)", we have already seen that 


2 E.. 
lal? = (a, a) = af +a). 


Let a, b be two vectors in R?, and let @ denote the angle between 
them.! By this we shall always mean the angle 6 such that 0 < 6 < x. 
If@ < z, the vectors a, b and c = b — a form a triangle, where c is the 
side opposite the angle 0, as, for example, in Figure 1.6. 

The law of cosines (which is a generalisation of Pythagoras’ theo- 
rem) applied to this triangle gives us the important relationship stated 
in the following theorem. 


' Angles are always measured in radians, not degrees, here. So, for example 45 degrees is 7 /4 radians. 
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Figure 1.6 Two 
vectors and the 
angle between them 


Theorem 1.43 Leta, b € R? and let 6 denote the angle between them. 
Then 


(a, b) = |lal| ||| cos 8. 


Proof. The law of cosines states that c? = a? + b? — 2abcos6, where 
c = ||b—all, a =|lal|, b = ||b||. That is, 


Ib — all? = |lall? + [[b||? — 2llall [|b] cos 8. (1) 
Expanding the inner product and using its properties, we have 
Ib — all? = (b — a, b — a) = (b, b) + (a, a) — 2(a,b), 
so that 
|b — al? = lal? + Ibl? — 2(a, b). (2) 
Comparing equations (1) and (2) above, we conclude that 


(a, b) = ljali |||] cos 8. 


Theorem 1.43 has many geometrical consequences. For example, we 
can use it to find the angle between two vectors by using 
(a, b) 
llall bil 


Example 1.44 Let v= (3) , W= ( 


oF and let 0 be the angle 


between them. Then 


so that 0 = - 
4 
Since 
(a, b) = |al| ||b]| cos 8, 


and since —1 < cos@ < 1 for any real number 6, the maximum value 
of the inner product is (a, b) = |lall ||b||. This occurs precisely when 
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(a1, a2, a3) 


> 


y 
(a1, az, 0) 


cos = 1; that is, when 6 = 0. In this case, the vectors a and b are 
parallel and in the same direction. If they point in opposite directions, 
then 0 = x and we have (a, b) = —|la|| ||b||. The inner product will be 
positive if and only if the angle between the vectors is acute, meaning 


T 


that 0 < 0 < 5. It will be negative if the angle is obtuse, meaning that 


a 
5 <0 <T. 


The non-zero vectors a and b are orthogonal (or perpendicular 
or, sometimes, normal) when the angle between them is 6 = 7. Since 
cos(7) = 0, this is precisely when their inner product is zero. We restate 


this important fact: 


e The vectors a and b are orthogonal if and only if (a, b) = 0. 


1.9.3 Vectors in R? 


Everything we have said so far about the geometrical interpretation of 


vectors and the inner product in I 


a= 


then 


lla] = 


R? extends to R?. In particular, if 


a? +a? +a. 


Activity 1.45 Show this. Sketch a position vector a = (a1, az, a3)" in 


R3. Drop a perpendicular to the 


xy-plane as in Figure 1.7, and apply 


Pythagoras’ theorem twice to obtain the result. 


The vectors a, b and ¢ = b — a in R? lie in a plane and the law of 
cosines can still be applied to establish the result that 


(a, b) = 


where 0 is the angle between the 


WWW 


lal] [|bl| cos 6, 


vectors. 
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Figure 1.7 Diagram 
for Activity 1.45 


Figure 1.8 The line 
y = 2x. The vector 
shown is v = (1, 2)? 
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yt 


(0, 0) T 


Activity 1.46 Calculate the angles of the triangle with sides a, b, c and 
show it is an isosceles right-angled triangle, where 


1 —1 
a= (2). e=(3); c=b-—a. 
2 4 


1.10 Lines 


1.10.1 Lines in R? 


In R?, a line is given by a single Cartesian equation, such as y = ax + b, 
and, as such, we can draw a graph of the line in the xy-plane. This line 
can also be expressed as a single vector equation with one parameter. 
To see this, look at the following examples. 


Example 1.47 Consider the line y = 2x. Any point (x, y) on this line 
must satisfy this equation, and all points that satisfy the equation are on 
this line (Figure 1.8). 

Another way to describe the points on the line is by giving their 
position vectors. We can let x = t, where ¢ is any real number. Then y 
is determined by y = 2x = 2t. So ifx = (x, y)! is the position vector 
of a point on the line, then 


a lef lel te 
Ema a 


For example, if t = 2, we get the position vector of the point (2, 4) on 
the line, and ift = —1, we obtain the point (—1, —2). As the parameter 
t runs through all real numbers, this vector equation gives the position 
vectors of all the points on the line. 

Starting with the vector equation 


SORTON 


ga 


æ 
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yt 


we can retrieve the Cartesian equation using the fact that the two vectors 
are equal if and only if their components are equal. This gives us the 
two equations x = ¢ and y = 2t. Eliminating the parameter t between 
these two equations yields y = 2x. 


The line in the above example is a line through the origin. What about 
a line which does not contain (0, 0)? 


Example 1.48 Consider the line y = 2x + 1. Proceeding as above, we 
set x = t, t € R. Then y = 2x + 1 = 2t + 1, so the position vector of 
a point on this line is given by 


r= (= (1)* (2) =U) tl) ts 


We can interpret this as follows. To locate any point on the line, first 
locate one particular point which is on the line, for example the y 
intercept, (0, 1). Then the position vector of any point on the line is a 
sum of two displacements, first going to the point (0, 1) and then going 
along the line, in a direction parallel to the vector v = (1, 2)". It is 
important to notice that in this case the actual position vector of a point 
on the line does not lie along the line. Only if the line goes through the 
origin will that happen. 


ga 


Activity 1.49 Sketch the line y = 2x + 1 and the position vector q 
of the point (3, 7) which is on this line. Express q as the sum of two 
vectors, q = p + tv where p = (0, 1)! and v = (1, 2)! for some ¢ € R, 
and add these vectors to your sketch. 


In the vector equation, any point on the line can be used to locate the 
line, and any vector parallel to the direction vector, v, can be used to 
give the direction. So, for example, 


Cn a) 


is also a vector equation of this line. 


A 
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Figure 1.9 The line 


y =2x + 1. The 
vector shown is 
v=(1,2)' 
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Activity 1.50 If q = (3, 7)', what is s in this expression of the line? 


As before, we can retrieve the Cartesian equation of the line by equating 
components of the vector and eliminating the parameter. 


Activity 1.51 Do this for each of the vector equations given above for 
the line y = 2x + 1. 


In general, any line in R? is given by a vector equation with one param- 
eter of the form 


x=p-+tty, 
where x is the position vector of a point on the line, p is any particular 


point on the line and v is the direction of the line. 


Activity 1.52 Write down a vector equation of the line through the 
points P = (—1, 1) and Q = (3, 2). What is the direction of this line? 
Find a value for c such that the point (7, c) is on the line. 


In R?, two lines are either parallel or intersect in a unique point. 


Example 1.53 The lines £; and £2 are given by the following equations 


C ()a(ar(') 
B 


These lines are not parallel, since their direction vectors are not scalar 
multiples of one another. Therefore, they intersect in a unique point. We 
can find this point either by finding the Cartesian equation of each line 
and solving the equations simultaneously, or using the vector equations. 
We will do the latter. We are looking for a point (x, y) on both lines, so 
its position vector will satisfy 


oo a a 


for some t € R and for some s € R. We need to use different symbols 
(s and ¢) in the equations because they are unlikely to be the same 
number for each line. We are looking for values of s and t which will 
give us the same point. Equating components of the position vectors of 
points on the lines, we have 


reer, 2e+t =4 2s+t=4 


3+2t= 6+s Sst = 3 —2s +4 = 6` 
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Adding these last two equations, we obtain t = 2, and therefore s = 1. 
Therefore, the point of intersection is (3, 7): 


(3) +72) = (7) = (6) 17): 


What is the angle of intersection of these two lines? Since 


(G) (T) =e 


the lines are perpendicular. 


1.10.2 Lines in R? 


How can you describe a line in R?? Because there are three variables 
involved, the natural way is to use a vector equation. To describe a line, 
you locate one point on the line by its position vector, and then travel 
along from that point in a given direction, or in the opposite direction 
(Figure 1.10). 

Therefore, a line in R? is given by a vector equation with one 
parameter, 


x=p+tv, teR, 


where x is the position vector of any point on the line, p is the position 
vector of one particular point on the line and v is the direction of the line, 


x Pi vı 
x=|;y]=|p]]+tļvlļ, te 
Z P3 V3 


The equation, x = tv represents a parallel line through the origin. 


a 


(1.10.2) 


Example 1.54 The equations 


1 1 3 —3 
= (3) (2) ma x= (7) +s (=); s,te 
4 —1 2 3 


describe the same line. This is not obvious, so how do we show it? 


A 
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Figure 1.10 A line 
in R? 
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The lines represented by these equations are parallel since their direction 


vectors are parallel 
—3 1 
3 -1 


so they either have no points in common and are parallel, or they have 
all points in common and are really the same line. Since 


A 


the point (3,7,2) is on both lines, so they must have all points in 
common. We say that the lines are coincident. 
On the other hand, the lines represented by the equations 


1 1 3 —3 
= (3) e2) wa x= (7) + (=6); te 
4 —1 1 3 


are parallel, with no points in common, since there is no value of ¢ for 


~ GG) 


Activity 1.55 Verify this last statement. 


A 


Now try the following: 


Activity 1.56 Write down a vector equation of the line through the 
points P = (—1, 1, 2) and Q = (3, 2, 1). What is the direction of this 
line? Is the point (7, 1,3) on this line? Suppose you want a point on 
this line of the form (c, d, 3). Find one such point. How many choices 
do you actually have for the values of c and d? 


We can also describe a line in R? by Cartesian equations, but this time 
we need two such equations because there are three variables. Equating 
components in the vector equation 1.10.2 above, we have 


x = pi + tv, y = Pi + tv2, Z = p3 + tv3. 


Solving each of these equations for the parameter t and equating the 
results, we have the two equations 
x= pi Pepe Z-p 


= , provided v; Æ 0, i = 1, 2,3. 
Vv} U2 U3 
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Example 1.57 To find Cartesian equations of the line 


1 —1 
<=(2}4(0), te 
3 5 


we equate components 


x 


x=1-t, y=2, z=3+4+%5, 
and then solve for ¢ in the first and third equation. The Cartesian equa- 
tions are 


z— 
i 


and y=2. 


This is a line parallel to the xz-plane in R°. The direction vector has a 
0 in the second component, so there is no change in the y direction: the 
y coordinate has the constant value y = 2. 


In R?, two lines are either parallel or intersect in a unique point. In R3, 
more can happen. Two lines in R? either intersect in a unique point, are 
parallel or are skew, which means that they lie in parallel planes and are 
not parallel. 

Try to imagine what skew lines look like. If you are in a room with 
a ceiling parallel to the floor, imagine a line drawn in the ceiling. It 
is possible for you to draw a parallel line in the floor, but instead it is 
easier to draw a line in the floor which is not parallel to the one in the 
ceiling. These lines will be skew; they lie in parallel planes (the ceiling 
and the floor). If you could move the skew line in the floor onto the 
ceiling, then the lines would intersect in a unique point. 

Two lines are said to be coplanar if they lie in the same plane, in 
which case they are either parallel or intersecting. 


Example 1.58 Are the following lines L; and L3 intersecting, parallel 


or skew? 
x 1 1 
ui (3) = (3) 2) LER 
2 4 —1 
x 5 —2 
L3: C) -(s)={ 1 ) teR. 
Z 1 7 


Activity 1.59 Clearly, the lines are not parallel. Why? 
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The lines intersect if there is a point (x, y, z) on both lines; that is, if 
there exist values of the parameters, s, t such that 


CeCe mie 


Equating components, we need to solve the three simultaneous equa- 
tions in two unknowns, 


1+f=5—2s 2s+t=4 
34+2t=6+5s |> —s+2t=3 
4—t=1+4+7s Is +t =3. 


We have already seen in Example 1.53 that the first two equations have 
the unique solution, s = 1, t = 2. Substituting these values into the 
third equation, 


Is tt =I +243, 


we see that the system has no solution. Therefore, the lines do not 
intersect and must be skew. 


Example 1.60 On the other hand, if we take a new line L3, which is par- 
allel to Lz but which passes through the point (5, 6, —5), then the lines 


«()-()() 
OKOKER 


do intersect in the unique point (3, 7, 2). 


A 


Activity 1.61 Check this. Find the point of intersection of the two lines 
Lı and L3. 


1.11 Planes in R? 


On a line, there is essentially one direction in which a point can move, 
given as all possible scalar multiples of a given direction, but on a 
plane there are more possibilities. A point can move in two different 
directions, and in any linear combination of these two directions. So 
how do we describe a plane in R?? 

The vector parametric equation 


x=p+sv+tw, s,t,ER 
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describes the position vectors of points on a plane in R? provided that 
the vectors v and w are non-zero and are not parallel. The vector p is 
the position vector of any particular point on the plane and the vectors 
v and w are displacement vectors which lie in the plane. By taking all 
possible linear combinations x = p + sv + tw, for s, t € R, we obtain 
all the points on the plane. 

The equation 


x=sv+tw, s,t,ER 


describes a plane through the origin. In this case, the position vector, x, 
of any point on the plane lies in the plane. 


Activity 1.62 If v and w are parallel, what does the equation x = 
p+sv+tw, s,t € R, actually represent? 


Example 1.63 You have shown that the lines L; and L3 given in exam- 
ple 1.60 intersect in the point (3, 7, 2). Two intersecting lines determine 
a plane. A vector equation of the plane containing the two lines is given 


(QL ol) 


Why? We know that (3, 7, 2) is a point on the plane, and the directions 
of each of the lines must lie in the plane. As s and ¢ run through all 
real numbers, this equation gives the position vector of all points on 
the plane. Since the point (3, 7, 2) is on both lines, ift = 0 we have the 
equation of L4, and if s = 0, we get L3. 

Any point which is on the plane can take the place of the vector 
(3, 7,2)", and any non-parallel vectors which are linear combinations 
of v and w can replace these in the equation. So, for example, 


()-() (3) 6G) = 


is also an equation of this plane. 


‘aa 


A 


Activity 1.64 Verify this. Show that (1, 3,4) is a point on the plane 
given by each equation, and show that (—3, —1, 8)! is a linear combi- 
nation of (1, 2, —1)' and (—2, 1, 7)!. 


There is another way to describe a plane in R? geometrically, which 
is often easier to use. We begin with planes through the origin. Let n 
be a given vector in R? and consider all position vectors x which are 
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orthogonal to n. Geometrically, the set of all such vectors describes a 
plane through the origin in R°. 

Try to imagine this by placing a pencil perpendicular to a table 
top. The pencil represents a normal vector, the table top a plane, and 
the point where the pencil is touching the table is the origin of your 
coordinate system. Then any vector which you can draw on the table 
top is orthogonal to the pencil, and conversely any point on the table 
top can be reached by a directed line segment (from the point where the 
pencil touches the table) which is orthogonal to the pencil. 

A vector x is orthogonal to n if and only if 


(n, x) = 0, 


so this equation gives the position vectors, x, of points on the plane. If 
n = (a, b, c)! and x = (x, y, z)!, then this equation can be written as 


(0) 


ax + by+cz = 0. 


or 


This is a Cartesian equation of a plane through the origin in R*. The 
vector n is called a normal vector to the plane. Any vector which is 
parallel to n will also be a normal vector and will lead to the same 
Cartesian equation. 

On the other hand, given any Cartesian equation of the form 


ax + by + cz = 0, 


then this equation represents a plane through the origin in R? with 
normal vector n = (a, b, c)'. 


To describe a plane which does not go through the origin, we choose 
a normal vector n and one point P on the plane with position vector 
p. We then consider all displacement vectors which lie in the plane 
with initial point at P. If x is the position vector of any point on the 
plane, then the displacement vector x — p lies in the plane, and x — p is 
orthogonal to n. Conversely, if the position vector x of a point satisfies 
(n, x — p) = 0, then the vector x — p lies in the plane, so the point (with 
position vector x) is on the plane. 

(Again, think about the pencil perpendicular to the table top, only 
this time the point where the pencil is touching the table is a point, P, 
on the plane, and the origin of your coordinate system is somewhere 
else; say, in the corner on the floor.) 
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The orthogonality condition means that the position vector of any 
point on the plane is given by the equation 
(n,x — p) = 0. 
Using properties of the inner product, we can rewrite this as 
(n, x) = (n, p), 


where (n, p) = d is a constant. 
Ifn = (a, b,c)! and x = (x, y, z)!, then 


ax +by +cz =d 


is a Cartesian equation of a plane in R3. The plane goes through the 
origin if and only if d = 0. 


Example 1.65 The equation 
2x —3y—5z=2 


represents a plane which does not go through the origin, since (x, y, z) = 
(0, 0, 0) does not satisfy the equation. To find a point on the plane, we 
can choose any two of the coordinates, say y = 0 and z = 0, and then 
the equation tells us that x = 1. So the point (1, 0, 0) is on this plane. 
The components of a normal to the plane can be read from this equation 
as the coefficients of x, y,z: n = (2, —3, —5)!. 


How does the Cartesian equation of a plane relate to the vector para- 
metric equation of a plane? A Cartesian equation can be obtained from 
the vector equation algebraically, by eliminating the parameters in the 
vector equation, and vice versa, as the following example shows. 


Example 1.66 Consider the plane 


x 1 —2 
C) = ( 2 ) + 1 Jase s,te 
Z —|l 7 


which is a plane through the origin parallel to the plane in Example 1.63. 
The direction vectors v = (1,2, —1)' and w = (—2, 1, 7) lie in the 
plane. 

To obtain a Cartesian equation in x, y and z, we equate the compo- 
nents in this vector equation. 


has 


x=s-—2t 
y=2s+t 
z=-—s+7t 
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and eliminate the parameters s and t. We begin by solving the first 
equation for s, and then substitute this into the second equation to solve 
for ¢ in terms of x and y, 


S=x4+2t 
=> y =2(x +2t)+t =2x +50 
=> 5t=y—2x 
y— 2x 
5 


We then substitute back into the first equation to obtain s in terms of x 
and y, 


> t= 


y— 2x 


x+2y 
A 


Finally, we substitute for s and ¢ in the third equation, z = —s + 7t, and 
simplify to obtain a Cartesian equation of the plane 


s=x+2( ) S 5s=5r+2y -4r > s= 


3x—-y+z=0. 


Activity 1.67 Carry out this last step to obtain the Cartesian equation 
of the plane. 


This Cartesian equation can be expressed as (n, x) = 0, where 


C) 6) 


The vector n is a normal vector to the plane. We can check that n is, 
indeed, orthogonal to the plane by taking the inner product with the 
vectors v and w, which lie in the plane. 


Activity 1.68 Do this. Calculate (n, v) and (n, w), and verify that both 
inner products are equal to 0. 


Since n is orthogonal to both v and w, it is orthogonal to all linear 
combinations of these vectors, and hence to any vector in the plane. 
So this plane can equally be described as the set of all position vectors 
which are orthogonal to n. 


Activity 1.69 Using the properties of inner product, show that this 
last statement is true. That is, if (n,v) =0 and (n,w) =0, then 
(n, sv + tw) = 0, foranys,t € R. 


Can we do the same for a plane which does not pass through the origin? 
Consider the following example. 
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Example 1.70 The plane we just considered in Example 1.66 is parallel 
to the plane with vector equation 


x 3 1 —2 
@ = (1) +(2 + 1 )=pescbow s,tER, 
Z 2 —1 7 


which passes through the point (3, 7, 2). Since the planes are parallel, 
they will have the same normal vectors. So the Cartesian equation of 
this plane is of the form 


3x —y+z=d. 


Since (3, 7, 2) is a point on the plane, it must satisfy the equation for the 
plane. Substituting into the equation we find d = 3(3) — (7) + (2) = 4 
(which is equivalent to finding d by using d = (n, p)). So the Cartesian 
equation we obtain is 


3x —y+z=4. 


Conversely, starting with a Cartesian equation of a plane, we can obtain 
a vector equation. Consider the plane just discussed. We are looking for 
the position vector of a point on the plane whose components satisfy 
3x — y +z = 4, or, equivalently, z = 4 — 3x + y. (We can solve for 
any one of the variables x, y or z, but we chose z for simplicity.) So we 
are looking for all vectors x such that 


SFS 


for any x, y € R. Therefore, 


-eet =s 


is a vector equation of the same plane as that given by the original vector 
equation, 


-OE = 


It is difficult to spot at a glance that these two different vector equations 
in fact describe the same plane. There are many ways to show this, but 
we can use what we know about planes to find the easiest. The planes 
represented by the two vector equations have the same normal vector n, 
since the vectors (1, 0, —3)" and (0, 1, 1)! are also orthogonal to n. So 


A 


ga 
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we know that the two vector equations represent parallel planes. They 
are the same plane if they have a point in common. It is far easier to 
find values of s and t for which p = (3, 7, 2)' satisfies the new vector 


equation 
3 0 1 0 
(7)=(0)+*(0 Jar(1). oe 
2 4 —3 1 


than the other way around (which is by showing that (0, 0, 4) satisfies 
the original equation) because of the positions of the zeros and ones in 
these direction vectors. 


A 


Activity 1.71 Do this. You should be able to immediately spot the 
values of s and t which work. 


Using the examples we have just done, you should now be able to tackle 
the following activity: 


Activity 1.72 The two lines, Lı and Lo, 
x 1 1 
os ()=(a)+(2). 
Zz 4 —1 
x 5 —2 
Lo: (= (5) ++/ 1 \ te 
Z 1 7 


in Example 1.58 are skew, and therefore are contained in parallel planes. 
Find vector equations and Cartesian equations for these two planes. 


A 


Two planes in R? are either parallel or intersect in a line. Considering 
such questions, it is usually easier to use the Cartesian equations of 
the planes. If the planes are parallel, then this will be obvious from 
looking at their normal vectors. If they are not parallel, then the line 
of intersection can be found by solving the two Cartesian equations 
simultaneously. 


Example 1.73 The planes 
x+2y—3z=0 and —2x-—4y+6z=4 
are parallel, since their normal vectors are related by 
(=2,—4,6)) = —2(1, 2, —3)". 


The equations do not represent the same plane, since they have no points 
in common; that is, there are no values of x, y, z which can satisfy both 
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equations. The first plane goes through the origin and the second plane 
does not. 
On the other hand, the planes 


x+2y—3z=0 and x—2y+5z=4 


intersect in a line. The points of intersection are the points (x, y, Zz) 
which satisfy both equations, so we solve the equations simultaneously. 
We begin by eliminating the variable x from the second equation, by 
subtracting the first equation from the second. This will naturally lead 
us to a vector equation of the line of intersection: 


x+2y—3z=0 = x+2y —3z=0 
x—2y+5z=4 —4y4+8z2=4. 
This last equations tells us that if z = ¢ is any real number, then y = 


—1-+ 2t. Substituting these expressions into the first equation, we find 
x = 2 — t. Then a vector equation of the line of intersection is 


()-(535)-(@)o@) 


This can be verified by showing that the point (2, —1, 0) satisfies both 
Cartesian equations, and that the vector v = (—1, 2, 1)" is orthogonal 
to the normal vectors of each of the planes (and therefore lies in both 
planes). 


Activity 1.74 Carry out the calculations in the above example and 
verify that the line is in both planes. 


1.12 Lines and hyperplanes in R” 


1.12.1 Vectors and lines in R” 


We can apply similar geometric language to vectors in IR”. We can think 
of the vector a = (a1, a2, ..., a»)! as defining a point in R”. Using the 
inner product (defined in Section 1.8.2), we define the length of a vector 
X = (x1, X2,...,%y)' by 


[xl = yx? ta3+---+22 or Ixl? = (x, x). 


We say that two vectors, v, w € IR” are orthogonal if and only if 


(v, w) = 0. 
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A line in R” is the set of all points (x1, x2,...,x,) whose position 
vectors x satisfy a vector equation of the form 


x=p+tv, tER, 


where p is the position vector of one particular point on the line and v 
is the direction of the line. If we can write x = tv, t € R, then the line 
goes through the origin. 


1.12.2 Hyperplanes 


The set of all points (x1, %2,...,X,) which satisfy one Cartesian 
equation, 


AX, + AX. +--+ + ayXn =d, 


is called a hyperplane in R”. 
In R?, a hyperplane is a line, and in R? it is a plane, but for n > 3 
we simply use the term hyperplane. The vector 


ai 
a2 
an 


is anormal vector to the hyperplane. Writing the Cartesian equation in 
vector form, a hyperplane is the set of all vectors, x € IR” such that 


(n,x — p) = 0, 
where the normal vector n and the position vector p of a point on the 


hyperplane are given. 


Activity 1.75 How many Cartesian equations would you need to 
describe a line in R”? How many parameters would there be in a vector 
equation of a hyperplane? 


1.13 Learning outcomes 


You should now be able to: 


e explain what is meant by a matrix 

e use matrix addition, scalar multiplication and matrix multiplication 
appropriately (and know when and how these operations are defined) 

e manipulate matrices algebraically 
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e state what is meant by the inverse of a square matrix, a power of a 
square matrix and the transpose of a matrix, and know the properties 
of these in order to manipulate them 

e explain what is meant by a vector and by Euclidean n-space 

e state what is meant by the inner product of two vectors and what 
properties it satisfies 

e state what is meant by the length and direction of a vector, and what 
is meant by a unit vector 

e state the relationship between the inner product and the length of a 
vector and angle between two vectors 

e explain what is meant by two vectors being orthogonal and how to 
determine this 

e find the equations, vector and Cartesian, of lines in R?, lines and 
planes in R3, and work problems involving lines and planes 

e state what is meant by a line and by a hyperplane in R”. 


1.14 Comments on activities 


Activity 1.3 For this matrix, a32 = 9. 
Activity 1.4 Only the second matrix is diagonal. 


Activity 1.11 AB is2 x 2 and BA is3 x 3, 
7 


5 10 
m sa= (2 i a 
3 3 4 


ive _ (1 3 _ (4 6 
Activity 1.12 AB=(, J BA=(5 A 


Activity 1.13 If A is m x n and B is n x p, then AB is an m x p 
matrix. The size of a matrix is not changed by scalar multiplication, 
so both A(AB) and (AA)B are m x p. Looking at the (i, j) entries of 
each, 


(A(AB));j =À (aiibi; + ai2bzj F... + Ging) 
= Aajb1; + haj2b2; +... + Adinbnj 
= (AA)B)i;, 


so these two matrices are equal. 
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Activity 1.16 In this case, J ism x m. 


= fh 2 —2 1) 71 0 
Activity 1.22 AB=(, 1) (3 et 1) 


—2 1 1 2 
and BA=( )( ) 
2 -3/\3 4 


2 
Therefore, AT! = (> | 


Activity 1.24 We will show one way (namely, that AAT! = T), but you 
should also show that AT! A = J. 


_ fa b 1 (£ v) 
os =(¢ ae —c a 

B 1 Ci Ean) 

— ad—bc \cd—de —bc+ad 


Sa 


Activity 1.26 We will do the first, and leave the others to you. The 
inverse of A” is a matrix B such that A” B = BA" = I. So show that 
the matrix B = (47!) works: 


A"(AT!Y =(AA ... AAA... ATD). 


ae, 
r times r times 


Removing the brackets (matrix multiplication is associative) and replac- 
ing each central AAT! = J, the resultant will eventually be AIAT! = 
AA7| = I. To complete the proof, show also that (A~!)" A” = I. There- 
fore, (A4)! = (471). 


Activity 1.29 Given the sizes of A and B, the matrix AB ism x p, so 
(AB)! is p x m. Also, A! is n x m and BT is p x n, so the only way 
these matrices can be multiplied is as B' A‘ (unless m = p). 


Activity 1.30 The (i, /) entry of B' A! is obtained by taking row i of BT, 
which is column i of B and multiplying each term by the corresponding 
entry of column j of A", which is row j of A, and then summing the 
products: 


(8°47), = bay + b4a j2 Seas + bina jn 


This produces the same scalar as the (i, j) entry of (AB)!. 
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Activity 1.32 The matrix is 


1 4 5 
A= [: > =] = A". 
5 -7 3 


4 
alb=(1 2 »(-2] = (3) 


Activity 1.33 


1 


1 4 —2 1 
ab! = (2) -2 1)= | 8 —4 2). 

3 12 —6 3 
Activity 1.37 To prove properties (ii) and (iii), apply the definition to 
the LHS (left-hand side) of the equation and rearrange the terms to 
obtain the RHS (right-hand side). For example, for x, y € R”, using the 
properties of real numbers: 

a(x, y) = A(X + X22 + +++ + XnyYn) 
= AX, Vy + AX2V2 +--+ + AXnYn 


= (æx1)yı + (ax2)y2 ++ +++ (Xn) Vn = (OX, y). 


Do the same for property (111). 

The single property (ax + By, z) = a(x, z) + (y, z) implies prop- 
erty (ii) by letting 6 = 0 for the first equality and then letting a = 0 for 
the second, and property (iii) by letting a = 6 = 1. On the other hand, 
if properties (ii) and (iii) hold, then 

(ax + By, z) = (ax, z) + (By,z) by property (iii) 
= a(x,Z) + B(y,z) by property Gi) . 


Activity 1.42 |/a|| = 5, so 


-174 5. epee? ‘) 
u=3 (3) ii w=-5 (3) 


Activity 1.45 In the figure below 
z 


(a1, a2, a3) 


y 
(ai, a2, 0) 
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the line from the origin to the point (a1, a2, 0) lies in the xy-plane, and 


by Pythagoras’ theorem, it has length ,/a? + a3. Applying Pythagoras’ 
theorem again to the right triangle shown, we have 


2 
lall = (va +43) Cn ee E ee 


Activity 1.46 We have 


mO G) G) 


The cosines of the three angles are given by 


(a,b)  —-14+24+8 1 
||al| [|b] V9v/18 v2 
(a, C) aE 
l|al| lel V9/9 


(b,c) 2-148 1 
IIbilllel]  18VY9 v2 
Thus, the triangle has a right-angle, and two angles of 2/4. 
Alternatively, as the vectors a and ¢ are orthogonal, and have the 
same length, it follows immediately that the triangle is right-angled and 
isosceles. 


Activity 1.49 If t = 3, then q = (3,7)!. You are asked to sketch the 
position vector q as this sum to illustrate that the vector q does locate a 
point on the line, but the vector q does not lie on the line. 


Activity 1.50 Here s = —1. 


Activity 1.51 We will work through this for the second equation and 
leave the first for you. We have, for s € R, 


= +s => => =—s5 = —_, 
y 3 —4 y=3-4s 2 4 


which yields 201 — x) = 3 — y or y = 2x + 1. 


Activity 1.52 A vector equation of the line is 


-1 4 
x= 1 +t 1 =prttv, t€ 


a 
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where we have used p to locate a point on the line, and the direction 
vector, v = q — p. The point (7, 3) is on the line (t = 2), and this is 
the only point of this form on the line, since once 7 is chosen for the x 
coordinate, the y coordinate is determined. 


Activity 1.55 Once given, for example, that the x coordinate is x = 3, 
the parameter t of the vector equation is determined. Therefore, so too 
are the other two coordinates. We saw in the example that t = 2 satisfies 
the first two equations and it certainly does not satisfy the third equation, 
1=0-t. 


Activity 1.56 This is similar to the earlier activity in R?. A vector 
equation of the line is 


a 
| 
ti 
Ne | 
= 
xo 
+ 
~ 
[7N 
BEREN 
= 
Sauna 
II 
gz) 
+ 
~ 
x 
~ 
M 
A 


The point (7, 1,3) is not on this line, but the point (—5, 0,3) is on 
the line. The value t = —1 will then satisfy all three component equa- 


tions. There is, of course, only one possible choice for the values of c 
and d. 


Activity 1.59 The lines are not parallel because their direction vectors 
are not parallel. 


Activity 1.62 If v and w are parallel, then this equation represents a 
line in the direction v. If w = Av, then this line can be written as 


x=pt+(s+At)v, wherer=s+AteR. 


Activity 1.69 Using the properties of the inner product, we have for any 
s,t ER, 


(n, sv + tw) = s(n, v) +¢(n,w) =s-0+t-0=0. 
Activity 1.71 Equating components in the vector equation, we have 
3 = s and 7 = t from the first two equations, and these values do satisfy 


the third equation, 2 = 4 — 3s + t. 


Activity 1.72 The parallel planes must each contain the direction vec- 
tors of each ofthe lines as displacement vectors, so the vector equations 
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of the planes are, respectively 


@ (3) yi (2) 7 es 
eee 0 j (2) ts (2) 7 (+) 


The parallel planes have the same normal vector, which we need 
for the Cartesian equations. Recall that in Example 1.70, we found a 
Cartesian equation and a normal vector to the first plane, the plane 
which contains L: 


and 


3 
3x —y+z=4, with n= (-1) f 
1 
Note that the point (1,3,4) is on this plane because it satisfies the 
equation, but the point (5, 6, 1) does not. Substituting (5, 6, 1) into the 
equation 3x — y + z = d, we find the Cartesian equation of the parallel 
plane which contains L3 is 


3x —y+z=10. 


Activity 1.74 As stated, to verify that the line is in both planes, show 
that its direction vector is perpendicular to the normal vector of each 
plane, and that the point (2, —1, 0) satisfies both equations. 


Activity 1.75 To describe a line in R”, you need n — 1 Cartesian equa- 
tions. A vector parametric equation of a hyperplane in R” would require 
n — | parameters. 


1.15 Exercises 


Exercise 1.1 Given the matrices: 
1 0 1 


a s=(2 i), 
1 1 -1 
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which of the following matrix expressions are defined? Compute those 
which are defined. 


(a)4d @©)A4B+C (©4+CT (ACC (@BC 
AdB (@Cd Wda dd. 


Exercise 1.2 Find, if possible, a matrix A and a constant x such that 


1 7 —4 14 
(s o)a= (1s o). 

9 3 24 x 
Exercise 1.3 If A and B are invertible n x n matrices, then using the 
definition of the inverse, prove that 

(ABY! = BIA. 

Exercise 1.4 Solve for the matrix A: 

10)? 1 -2\7! 

T = 
a SAE g gt) > 


Exercise 1.5 Suppose A and B are matrices such that A and AB are 
invertible. Suppose, furthermore, that 


(AB)! =2471, 
Find B. 


Exercise 1.6 If B is an m x k matrix, show that the matrix BTB is a 
k x k symmetric matrix. 


Exercise 1.7 Let A be an m x n matrix and B ann x n matrix. Sim- 
plify, as much as possible, the expression 


(ATA) !AT(B-!AD"BTB? B7! 
assuming that any matrix inverse in the expression is defined. 


Exercise 1.8 Write down a vector equation for each of the following 
lines. 


(a) In R?, the line through the points (3, 1) and (—2, 4). 
(b) In R5, the line through the points (3,1,—1,2,5) and 
(—2, 4, 0, 1, 1). Is the point (4, 3, 2, 1, 4) on this line? 


Exercise 1.9 Find the vector equation of the line in R? with Cartesian 
equations 
x-—1 4 


an a 4 
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Exercise 1.10 Let 


x 1 —1 
L, be the line with equation >) = (3) +t | 5 ) , 
Z 2 4 


6 
La the line through (8, 0, —3) parallel to the vector 2 ) , 
—1 
L3 the line through (9, 3, 1) and (7, 13, 9). 


Show that two of the lines intersect, two are parallel and two are skew. 
Find the angle of intersection of the two intersecting lines. 


Exercise 1.11 Referring to the previous exercise, find the vector equa- 
tion and the Cartesian equation of the plane containing L, and L3. 


Exercise 1.12 Show that the line 


-GG 


does not intersect the plane 2x + z = 9. 

Find the equation of the line through the point (2, 3, 1) which is 
parallel to the normal vector of the plane, and determine at what point 
it intersects the plane. Hence, or otherwise, find the distance of the line 
to the plane. 


1.16 Problems 


Problem 1.1 Given the matrices 


2 1 1 1 2 1 
a-(1 1) v=(1), = 0 -1}, 
0 3 =| 4 1 1 
0 1 
p= (2 I 
6 3 


which of the following matrix expressions are defined? Compute those 
which are defined. 
(a) Ab (b)CA (c)A+Cb (d)A+D (e)b'D 
(f) DA"+C (e)b'b (h)bb" (i) Cb. 


Problem 1.2 If a and b are both column matrices of the same size, 
n x 1, what is the size of the matrix product a'b? 

What is the size of the matrix expression b'a? 

What is the relationship between a'b and b!a? 
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fs 1 a (fx y 
Problem 1.3 Let B= (5 4] and suppose B = A 


Solve the system of four equations given by the matrix equation 


BB =I, 
3 7 x y\_ /10 
0 -l z w) \0 17?’ 


to obtain the matrix B7!. 

Check your solution by finding B7! using the result in Activity 1.24. 
Then make absolutely sure B~! is correct by checking that BB! = 
B'B=lT. 


Problem 1.4 Find the matrix 4 if 


3 5 
-IT _ 
OE] 
Problem 1.5 A square matrix M is said to be skew symmetric if M = 
—M". 

Given that the 3 x 3 matrix A = (a;;) is symmetric and the 3 x 3 


matrix B = (b;;) is skew symmetric, find the missing entries in the 
following matrices: 


1 7 
a=(-«s a={ 3 02). 
0 2 —5 


Problem 1.6 If A is an n xn matrix, show that the matrix A+ 
AT is symmetric and the matrix A — AT is skew symmetric. (See 
Problem 1.5.) 

Show that any matrix A can be written as the sum of a symmetric 
matrix and a skew symmetric matrix. 


a b 
aS 4) 
is a 2 x 2 matrix such that AB = BA for all 2 x 2 matrices B. Show 
thata = d, b = 0, c = 0. Deduce that the only such matrices are scalar 
multiples of the identity matrix. 


Hint: If something is true for all 2 x 2 matrices, then it is true for 
any such matrix. Try some simple choices for B, such as 


1 0 
B= (4 a 


Problem 1.7 Suppose that 
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and calculate 4B = BA. Use the fact that two matrices are equal if 
and only if corresponding entries are equal to derive the conditions on 
a,b,c,d. 

Can you generalise the result to 3 x 3 matrices? To n x n matrices? 


Problem 1.8 Find a vector equation of the line through the points 
A = (4,5, 1) and B = (1,3, —2). 

Find values of c and d such that the points 4, B and C = (c, d, —5) 
are collinear; that is, are points on the same line. 


Problem 1.9 Show that the line 


1 2 
1 -1 


intersects the line with Cartesian equations, 


A 


5 pe i 
x= ’ y —_ 2 9 


and find the point of intersection. 


Problem 1.10 What is the relationship between the lines with equations 


1 3 8 1 
r= (2) +:(-2) rer JOLE 
1 1 —3 2 


Problem 1.11 Find the Cartesian equation of the plane which contains 
the point (5, 1, 3) and has normal vector n = (1, —4, 2)". 
Find also a vector (parametric) equation of this plane. 


A 


? 


Problem 1.12 Find a Cartesian equation of the plane given by 


1 2 0 
c= (1) (1) (i) me 
1 —1 2 


Show that the equation 


1 6 2 
c= (2) (2) (2); ane 
3 —5 —7 


represents the same plane. 


ge 


A 


Problem 1.13 Find the equation of the plane containing the two inter- 
secting lines of Problem 1.9. 
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Problem 1.14 Find the point of intersection of the line 
2 6 
= yi | a ) 
4 2 


x+y—3z=4. 


with the plane 
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Systems of linear equations 


Being able to solve systems of many linear equations in many unknowns 
is a vital part of linear algebra. We use matrices and vectors as essential 
elements in obtaining and expressing the solutions. 

We begin by expressing a system in matrix form and defining ele- 
mentary row operations on a related matrix, known as the augmented 
matrix. These operations mimic the standard operations we would use 
to solve systems of equations by eliminating variables. We then learn a 
precise algorithm to apply these operations in order to put the matrix in 
a special form known as reduced echelon form, from which the general 
solution to the system is readily obtained. The method of manipulat- 
ing matrices in this way to obtain the solution is known as Gaussian 
elimination. 

We then examine the forms of solutions to systems of linear equa- 
tions and look at their properties, defining what is meant by a homoge- 
neous system and the null space of a matrix. 


2.1 Systems of linear equations 


A system of m linear equations in n unknowns x1, x2, . . . , Xn is a set of 
m equations of the form 


ai1Xx1 + ay2X2 +--+ + dinXn = bı 


az21x1 + a22X2 ++ ++ + a ynXn = b2 


Am1X1 + Am2X2 + +++ + AmnXn = bm. 


The numbers a;; are known as the coefficients of the system. 
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Example 2.1 The set of equations 


xı + xX2 + X3 = 3 
2x, + x2 + X3 = 4 
xı — x2 +2x3; = 5 


is a system of three linear equations in the three unknowns x1, x2, x3. 


Systems of linear equations occur naturally in a number of applications. 
We say that s1, 52,..., Sn is a solution of the system if all m equa- 
tions hold true when 


X1 = $1, X2 = S2, ... , Xn = Sn. 


Sometimes a system of linear equations is known as a set of simul- 
taneous equations; such terminology emphasises that a solution is an 
assignment of values to each of the n unknowns such that each and 
every equation holds with this assignment. It is also referred to simply 
as a linear system. 


Example 2.2 The linear system 


xi + x2 + x3 + x4 +x5 =3 
2x1 + x2 + x3 + x4 + 2x5 = 4 
xı — X2 — x3 + x4 + x5 = 5 
xı + x4 + x5 = 4. 


is an example of a system of four equations in five unknowns, 
X1, X2, X3, X4, X5. One solution of this system is 


xı = —l, x2 = —2, x3 = l, x4 = 3, x5 = 2, 


as you can easily verify by substituting these values into the equations. 
Every equation is satisfied for these values of x1, x2, X3, X4, x5. How- 
ever, this is not the only solution to this system of equations. There are 
many more. 

On the other hand, the system of linear equations 


xi + x2 + x3 + x4 +x5 =3 
2x1 + x2 + x3 + x4 + 2x5 = 4 
xi — xz — x3 +x4 + xs =5 
xı + x4 + x5 = 6. 


has no solutions. There are no numbers we can assign to the unknowns 
X1, X2, X3, X4, X5 So that all four equations are satisfied. 
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How do we know this? How do we find all the solutions to a system 
of linear equations? 
We begin by writing a system of linear equations in matrix form. 


Definition 2.3 (Coefficient matrix) The matrix A = (a;;), whose (i, j) 
entry is the coefficient a;; of the system of linear equations is called the 
coefficient matrix 


dil a2 ... dın 
pon a21 aa A2n 
Aml Am2 ... Amn 
Let x = (x1, X2, - . . , Xn)! be the vector of unknowns. Then the product 


Ax of the m x n coefficient matrix A and the n x 1 column vector x is 
anm x l matrix, 


dil A2 ... Ain X] d11X1 + 412X2 + `+: + ainXn 
a2) An ~.. An X2 a21X1 + a22X2 + ` © + AnXn 

. . . . = . . 2 
Am1 dm2 «++ Amn Xn Am1X1 + An2X2 Fruit AmnXn 


whose entries are the left-hand sides of our system of linear equations. 
If we define another column vector b, whose m components are the 
right-hand sides b;, the system is equivalent to the matrix equation 


Ax =b. 


Example 2.4 Consider the following system of three linear equations 
in the three unknowns, x1, x2, x3: 


xı + xX2 + X3 =3 
2x1 + x2 + x3 = 4 
xi — xX + 2x3 = 5. 


This system can be written in matrix notation as Ax = b with 


1 1 1 x] 3 
A= 2 1 1 ’ x= X2 ’ b= 4 
1 =1 2 X3 5 


The entries of the matrix A are the coefficients of the x;. If we perform 
the matrix multiplication of Ax, 


1 1 1 x] x1 +x24+ x3 
2 1 1 x2 | = | 2x, +x2 + x3 |, 
1 -l 2 X3 X1 — X2 + 2x3 
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the matrix product is a3 x 1 matrix, a column vector. If Ax = b, then 


xı +x2+%3 3 
2x, +x2+x3] = 14], 
Xi — X2 + 2x3 5 


and these two 3 x 1 matrices are equal if and only if their components 
are equal. This gives precisely the three linear equations. 


2.2 Row operations 


Our purpose is to find an efficient means of finding the solutions of 
systems of linear equations. 

Consider Example 2.4. An elementary way of solving a system of 
linear equations such as 


xı + x2 + x3 =3 
2x, + x2 + x3 = 4 
xi — xXx: +2x3=5 


is to begin by eliminating one of the variables from two of the equations. 
For example, we can eliminate x; from the second equation by multi- 
plying the first equation by 2 and then subtracting it from the second 
equation. 

Let’s do this. Twice the first equation gives 2x; + 2x2 + 2x3 = 6. 
Subtracting this from the second equation, 2x; + x2 + x3 = 4, yields 
the equation —x2 — x3 = —2. We can now replace the second equation 
in the original system by this new equation, 


xı +x2+x3 =3 
=x? — X3 = —2 
xı — x2 + 2x3 = 5 


and the new system will have the same set of solutions as the original 
system. 

We can continue in this manner to obtain a simpler set of equations 
with the same solution set as the original system. Our next step might 
be to subtract the first equation from the last equation and replace the 
last equation, to obtain the system 


xy x2 + x3 =3 
x2 + x3 =2 
—2x, + x3 = 2 


www. TechnicalBooksPDF.com 


2.2 Row operations 63 


so that the last two equations now only contain the two variables x2 and 
x3. We can then eliminate one of these variables to eventually obtain 
the solution. 

So exactly what operations can we perform on the equations of a 
linear system without altering the set of solutions? There are three main 
such types of operation, as follows: 


O1 multiply both sides of an equation by a non-zero constant. 
O2 interchange two equations. 
O03 adda multiple of one equation to another. 


These operations do not alter the set of solutions since the restrictions 
on the variables x1, x2,...,X, given by the new equations imply the 
restrictions given by the old ones (that is, we can undo the manipulations 
made to retrieve the old system). 

At the same time, we observe that these operations really only 
involve the coefficients of the variables and the right-hand sides of the 
equations. 

For example, using the same system as above expressed in matrix 
form, Ax = b, then the matrix 


l l1 13 
(Adb)=[|2 1 1 4], 
1 -1 2 5 


which is the coefficient matrix A together with the constants b as the 
last column, contains all the information we need to use, and rather than 
manipulating the equations, we can instead manipulate the rows of this 
matrix. For example, subtracting twice equation 1 from equation 2 is 
executed by taking twice row 1 from row 2. 

These observations form the motivation behind a method to solve 
systems of linear equations, known as Gaussian elimination. To solve 
a linear system 4x = b, we first form the augmented matrix, denoted 
(A|b), which is A with column b tagged on. 


Definition 2.5 (Augmented matrix) If 4x = b is a system of linear 
equations, 


ai a2 +++ din x] by 
a2) an2 ` An X2 by 

A = % ‘ Š x= s b = . ’ 
Am1 Am2 “++ Amn Xn bin 
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then the matrix 


a an2 +++ dn By 
21 an arn b2 

(4lb) = . 
dm1 Am2 s+) Amn bm 


is called the augmented matrix of the linear system. 


From the operations listed above for manipulating the equations of the 
linear system, we define corresponding operations on the rows of the 
augmented matrix. 


Definition 2.6 (Elementary row operations) These are: 


RO1 multiply a row by a non-zero constant. 
RO2 interchange two rows. 
RO3 add a multiple of one row to another. 


2.3 Gaussian elimination 


We will describe a systematic method for solving systems of linear 
equations by an algorithm which uses row operations to put the aug- 
mented matrix into a form from which the solution of the linear system 
can be easily read. This method is known as Gaussian elimination or 
Gauss-Jordan elimination. To illustrate the algorithm, we will use two 
examples: the augmented matrix (A|b) of the example in the previous 
section and the augmented matrix (B|b) of a second system of linear 
equations, 

1 1 1 3 002 3 

dm= (2 1 1l +). cam) (1 23 +). 
1 —1 2 5 0 0 1 5 


2.3.1 The algorithm: reduced row echelon form 


Using the above two examples, we will carry out the algorithm in detail. 


(1) Find the leftmost column that is not all zeros. 


The augmented matrices are 
002 3 
| 02 34 ) 3 
0 0 1 5 


P 3 
(2 1 1 ‘ 
1 de 2 5 
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So this is column 1 of (A|b) and column 2 of (B |b). 


(2) Get a non-zero entry at the top of this column. 
The matrix on the left already has a non-zero entry at the top. For 
the matrix on the right, we interchange row 1 and row 2: 


1 1 1 3 02 34 
(2 1 1 ‘ (o 0 2 | 
bt =p ns 0 0 1 5 


(3) Make this entry 1; multiply the first row by a suitable number or 
interchange two rows. This 1 entry is called a leading one. 
The left-hand matrix already had a 1 in this position. For the second 


matrix, we multiply row 1 by one-half: 
0 1 2 
| 0 0 3 ) : 
0 0 1 5 


1 1 1 3 
(2 1 1 ‘ 
1 —1 2 5 


(4) Add suitable multiples of the top row to rows below so that all 
entries below the leading one become zero. 

For the matrix on the left, we add —2 times row 1 to row 2, then we 
add — 1 times row | to row 3. These are the same operations as the ones 
we performed earlier on the example using the equations. The matrix 
on the right already has zeros under the leading one: 


1 1 1 3 0 1 2 
(o = -1 | (o 0 ; : 
0 -2 1 2 0 0 1 5 


At any stage, we can read the modified system of equations from the new 
augmented matrix, remembering that column 1 gives the coefficients of 
x,, column 2 the coefficients of x» and so on, and that the last column 
represents the right-hand side of the equations. For example, the matrix 
on the left is now the augmented matrix of the system 


Nnlw 


Nwlw 


xi Hx + x = 3 
=N; NS —2 
—2x + x3 =2. 
The next step in the algorithm is 
(5) Cover up the top row and apply steps (1) to (4) again. 
This time we will work on one matrix at a time. After the first four 
steps, we have altered the augmented matrix (A|b) to 


l1 l1 1 3 
am— fo = ee 2). 
0—2 I 2 
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We now ignore the top row. Then the leftmost column which is not all 
zeros is column 2. This column already has a non-zero entry at the top. 
We make it into a leading one by multiplying row 2 by —1: 


1 1 1 3 
— (o 1 1 2). 
0 -2 1 2 


This is now a leading one, and we use it to obtain zeros below. We add 


2 times row 2 to row 3: 
1 1 1 3 
=> (o 1 1 2) ; 
0 0 3 6 


Now we cover up the top two rows and start again with steps (1) to (4). 
The leftmost column which is not all zeros is column 3. We multiply 
row 3 by one-third to obtain the final leading one: 


1 1 1 3 
= (o 1 1 a). 
0 0 1 2 


This last matrix is in row echelon form, or simply, echelon form. 


Definition 2.7 (Row echelon form) A matrix is said to be in row 
echelon form (or echelon form) if it has the following three properties: 


(1) Every non-zero row begins with a leading one. 
(2) A leading one in a lower row is further to the right. 
(3) Zero rows are at the bottom of the matrix. 


Activity 2.8 Check that the above matrix satisfies these three 
properties. 


The term echelon form takes its name from the form of the equations 
at this stage. Reading from the matrix, these equations are 


xı +X. +%3 =3 
X2 +%x3=2 
x3=2. 


We could now use a method called back substitution to find the solution 
of the system. The last equation tells us that x3 = 2. We can then 
substitute this into the second equation to obtain x2, and then use these 
two values to obtain xı. This is an acceptable approach, but we can 
effectively do the same calculations by continuing with row operations. 
So we continue with one final step of our algorithm. 
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(6) Begin with the last row and add suitable multiples to each row 
above to get zeros above the leading ones. 

Continuing from the row echelon form and using row 3, we replace 
row 2 with row 2—row 3, and at the same time we replace row 1 with 
row 1—row 3: 


1 1 13 1101 
am— (0 11 2)— [0 1 00). 
00 12 0012 


We now have zeros above the leading one in column 3. There is only 
one more step to do, and that is to get a zero above the leading one in 
column 2. So the final step is row 1—row 2: 


1 0 0 1 
— (o 1 0 o). 
0 0 I 2 


This final matrix is now in reduced row echelon form. It has the 
additional property that every column with a leading one has zeros 
elsewhere. 


Definition 2.9 (Reduced row echelon form) A matrix is said to be 
in reduced row echelon form (or reduced echelon form) if it has the 
following four properties: 


(1) Every non-zero row begins with a leading one. 

(2) A leading one in a lower row is further to the right. 
(3) Zero rows are at the bottom of the matrix. 

(4) Every column with a leading one has zeros elsewhere. 


If R is the reduced row echelon form of a matrix M, we will sometimes 
write R = RREF(M). 

The solution can now be read from the matrix. The top row says 
xı = 1, the second row says x. = 0 and the third row says x3 = 2. The 
original system has been reduced to the matrix equation 


1 0 0 xı 1 
0 0 1 x3 2 


giving the solution 


This system of equations has a unique solution. 
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We can check that this solution is the correct solution of the orig- 
inal system by substituting it into the equations, or equivalently, by 
multiplying out the matrices Ax to show that Ax = b. 


1 1 1 1 3 
Activity 2.10 Do this: check that (2 1 | (o) = (3); 
1 -l 2 2 5 


We now return to the example (B |b), which we left after the first round 
of steps (1) to (4), and we apply step (5). We cover up the top row and 
apply steps (1) to (4) again. We need to have a leading one in the second 
row, which we achieve by switching row 2 and row 3: 


01 2 2 0 1 2 
om — (0 0 2 J (0 0 l 
00 1 5 00 2 3 


We obtain a zero under this leading one by replacing row 3 with row 


3 + (—2) times row 2: 
013 2 
— (o 0 1 5 ) 
0 0 0 -7 


and then, finally, multiply row 3 by — h: 


01 3 2 
= (o0 o i 5). 
000 1 


This matrix is now in row echelon form, but we shall see that there is 
no point in going on to reduced row echelon form. This last matrix is 
equivalent to the system 


0 1 į\ Fai 2 
0 0 OF Axa l 


What is the bottom equation of this system? Row 3 says 0x; + 0x2 + 
0x3 = 1,thatis 0 = 1, which is impossible! This system has no solution. 
Putting an augmented matrix into reduced row echelon form using 
this algorithm is usually the most efficient way to solve a system of 
linear equations. In a variation of the algorithm, when the leading one 
is obtained in the second row, it can be used to obtain zeros both below it 
(as in the algorithm) and also above it. Although this may look attractive, 
it actually uses more calculations on the remaining columns than the 
method given here, and this number becomes significant for large n. 


= LOL GO 
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2.3.2 Consistent and inconsistent systems 


Definition 2.11 (Consistent) A system of linear equations is said to be 
consistent if it has at least one solution. It is inconsistent if there are no 
solutions. 


The example above demonstrates the following important fact: 


e If the row echelon form (REF) of the augmented matrix ( A|b) 
contains a row (00 --- 01), then the system is inconsistent. 


It is instructive to look at the original systems represented by the aug- 
mented matrices above: 


1 1 1 3 0 0 2 3 
(Alb) = (2 1 1 ‘ (B|b) = (o 2 3 ‘ ; 
1 -l 2 5 0 0 1 5 
These are 
Xptx2.+x3 = 3 2x3 = 3 
faker +x3 = 4 autr = 4 
xı — x2 +2x; = 5 X3 =Z- 


We see immediately that the system Bx = b is inconsistent since it is 
not possible for both the top and the bottom equation to hold. 

Since these are systems of three equations in three variables, we 
can interpret these results geometrically. Each of the equations above 
represents a plane in R?. The system Ax = b represents three planes 
which intersect in the point (1, 0, 2). This is the only point which lies 
on all three planes. The system Bx = b represents three planes, two of 
which are parallel (the horizontal planes 2x3 = 3 and x3 = 5), so there 
is no point that lies on all three planes. 

We have been very careful when illustrating the Gaussian elimina- 
tion method to explain what the row operations were for each step of 
the algorithm, but it is not necessary to include all this detail. The aim 
is to use row operations to put the augmented matrix into reduced row 
echelon form, and then read off the solutions from this form. Where it is 
useful to indicate the operations, you can do so by writing, for example, 
R — 2R,, where we always write down the row we are replacing first, 
so that R? — 2 R; indicates ‘replace row 2 (R2) with row 2 plus —2 times 
row | (R2 — 2R1)’. Otherwise, you can just write down the sequence of 
matrices linked by arrows. It is important to realise that once you have 
performed a row operation on a matrix, the new matrix obtained is not 
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equal to the previous one; this is why you must use arrows between the 
steps and not equal signs. 


Example 2.12 We repeat the reduction of (A|b) to illustrate this for the 
system 


xı +X2+X3 =3 
2x, + x2 + x3 = 4 
Xi — X2 + 2x3 = 5. 


Begin by writing down the augmented matrix, then apply the row 
operations to carry out the algorithm. Here we will indicate the row 


operations: 
t T T3 
(A|b) = (2 1 1 ‘ => 


1 -1 2 5 
ibai He 73 
R, — 2R; (o 1 =l 2 > 
Ree Re NO -2 1 2 
1 1 13 
(—1)R> (o 1 1 2) > 
0 -2 1 2 


1 1 13 
(o 11 2) > 
R3 +2R, \0 0 6 


1 1 1 3 
(or 1a), 
(4)R3 \0 0 1 2 


The matrix is now in row echelon form. We continue to reduced row 


echelon form: 
Rı— R; /1 1 0 1 
Ro = R3 (o 1 0 o) =) 
001 2 
Rı— Rə /1 0 0 1 
k 1 0 o) : 
0 0 1 2 


The augmented matrix is now in reduced row echelon form. 


(0S) 
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Activity 2.13 Use Gaussian elimination to solve the following system 
of equations: 
xı + X2 + X3 = 6 
2x; + 4x2 +2x3=5 
2x1 + 3x2 + x3 = 6. 


Be sure to follow the algorithm to put the augmented matrix into reduced 
row echelon form using row operations. 


2.3.3 Linear systems with free variables 


Gaussian elimination can be used to solve systems of linear equations 
with any number of equations and unknowns. We will now look at an 
example of a linear system with four equations in five unknowns: 
xı + x2 + x3 + x4 + x5 =3 
2x1 + x2 + x3 + x4 + 2x5 = 4 
xı — X2 — X3 + x4 + x5 =5 
xı + x4 + x5 = 4. 


The augmented matrix is 


l 1 1 113 
2 1 1 124 
CAB S a E S 
10 0 11 4 


Check that your augmented matrix is correct before you proceed, or 
you could be solving the wrong system! A good method is to first write 
down the coefficients by rows, reading across the equations, and then 
to check the columns do correspond to the coefficients of that variable. 
Now follow the algorithm to put (A|b) into reduced row echelon form: 


= l 1l 1 1 1 3 
Rı—2Rı |0 -1 -1 -1 0 -2 
R—-R, (0 —2 —2 0 0 2 
Ri Ri \0 Sh Sh 0 0 1 
l1 l1 1 113 
(CIR: |0 1 1 102 
— |o 2 2 002 
Oh -1 0 0 1 
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1 1 1 1 13 

—— 0 1 1 1 0 2 

R3+2R: |0 0 0 2 0 6 

R+ Ro 000 1 0 3 
1 111 b -S 
— 0 1 1 1 0 2 
GR |0 00103 
0 0 0 1 0 3 

1 1 1 1 13 

— 0 1 1 1 0 2 

0 0 0 1 0 3 


R4— R \0 0 0 0 0 0 


This matrix is in row echelon form. We continue to reduced row echelon 
form, starting with the third row: 


Rı—-R, /1 1 10 1 0 
R -R |0 1 10 0 -i 
— |0 0010 3 
00000 0 
10001 1 
Ricka yea To o-i 
y MED a 
00000 0 


There are only three leading ones in the reduced row echelon form of 
this matrix. These appear in columns 1, 2 and 4. Since the last row gives 
no information, but merely states that 0 = 0, the matrix is equivalent to 
the system of equations: 


xı +04+0+0+%5=1 
x2 +x3+04+0=-1 
x4 +0 =3. 


The form of these equations tells us that we can assign any values to x3 
and xs, and then the values of x1, x2 and x4 will be determined. 


Definition 2.14 (Leading variables) The variables corresponding to 
the columns with leading ones in the reduced row echelon form of an 
augmented matrix are called leading variables. The other variables are 
called non-leading variables. 


www. TechnicalBooksPDF.com 


2.3 Gaussian elimination 73 


In this example, the variables x1, x2 and x4 are leading variables, x3 and 
x5 are non-leading variables. We assign x3, x5 the arbitrary values s, t, 
where s, ¢ represent any real numbers, and then solve for the leading 
variables in terms of these. We get 


x4 = 3, xX.» =-l-s, x= l-t. 


Then we express this solution in vector form: 


x] l-t 1 0 —1 
X2 —l-s —1 —1 0 
x= x3 l= S =] 0 |+s 1 +t] 0 
x4 3 3 0 0 
X5 t 0 0 1 


Observe that there are infinitely many solutions, because any values of 
s € Randt € R will give a solution. 

The solution given above is called a general solution of the system, 
because it gives a solution for any values of s and t, and any solution 
of the equation is of this form for some s,¢ € R. For any particular 
assignment of values to s and t, such as s = 0, t = 1, we obtain a 
particular solution of the system. 


Activity 2.15 Lets = 0 and t = 0 and show (by substituting it into the 
equation or multiplying Axo) that xo = (1, —1, 0, 3, 0)! is a solution of 
Ax = b. Then let s = 1 and t = 2 and show that the new vector x; you 
obtain is also a solution. 


With practice, you will be able to read the general solution directly from 
the reduced row echelon form of the augmented matrix. We have 


(A|b) — 


Locate the leading ones, and note which are the leading variables. Then 
locate the non-leading variables and assign each an arbitrary parameter. 
So, as above, we note that the leading ones are in the first, second and 
fourth column, and so correspond to x1, x2 and x4. Then we assign 
arbitrary parameters to the non-leading variables; that is, values such 
as x3 = s and x5 = t, where s and t represent any real numbers. Then 
write down the vector x = (x1, x2, X3, X4, x5)! (as a column) and fill in 
the values starting with x; and working up. We have x; = t. Then the 
third row tells us that x4 = 3. We have x3 = s. Now look at the second 
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row, which says x2 + x3 = —l, or x2 = —1 — s. Then the top row tells 
us that x; = 1 — ź. In this way, we obtain the solution in vector form. 


Activity 2.16 Write down the system of three linear equations in three 
unknowns represented by the matrix equation Ax = b, where 


| 2 1 x 3 
a=(2 o). x= (>). v= (2). 
3 4 1 Zz 5 


Use Gaussian elimination to solve the system. Express your solution 
in vector form. If each equation represents the Cartesian equation of a 
plane in R?, describe the intersection of these three planes. 


2.3.4 Solution sets 


We have seen systems of linear equations which have a unique solution, 
no solution and infinitely many solutions. It turns out that these are the 
only possibilities. 


Theorem 2.17 A system of linear equations either has no solutions, a 
unique solution or infinitely many solutions. 


Proof. To see this, suppose we have a linear system Ax = b which has 
two distinct solutions, p and q. So the system has a solution and it is 
not unique. Thinking of these vector solutions as determining points in 
R”, then we will show that every point on the line through p and q is 
also a solution. Therefore, as soon as there is more than one solution, 
there must be infinitely many. 

If p and q are vectors such that Ap = b and Aq = b, p Æ q, then 
the equation of the line through p and q is 


v=p+d(q—p) teR. 


Then for any vector v on the line, we have Av = A(p + t(q — p)). 
Using the distributive laws, 


Av = Ap +tA(q — p) = Ap + t(4q — Ap) = b + t(b — b) = b. 


Therefore, v is also a solution for any ¢ € R, so there are infinitely many 
of them. 


Notice that in this proof the vector w = q — p satisfies the equation 
Ax = 0. This leads us to our next topic. 
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2.4 Homogeneous systems and null space 
2.4.1 Homogeneous systems 
Definition 2.18 A homogeneous system of linear equations is a linear 
system of the form Ax = 0. 
There is one easy, but important, fact about homogeneous systems: 
e A homogeneous system Ax = 0 is always consistent. 


Why? Because 40 = 0, so the system always has the solution x = 0. 
For this reason, x = 0 is called the trivial solution. 
The following fact can now be seen: 


e If Ax = 0 has a unique solution, then it must be the trivial solution, 
x=0. 


If we form the augmented matrix, (A | 0), of a homogeneous system, 
then the last column will consist entirely of zeros. This column will 
remain a column of zeros throughout the entire row reduction, so there 
is no point in writing it. Instead, we use Gaussian elimination on the 
coefficient matrix A, remembering that we are solving Ax = 0. 


Example 2.19 Find the solution of the homogeneous linear system, 


x+y+3z+w=0 
x-yt+z+w=0 
y+2z+2w=0. 


We reduce the coefficient matrix A to reduced row echelon form, 


1 1 3 1 1 1 3 1 
A= (| -1 1 ‘ — (o S20 12 o) 
0 1 2 2 0 1 2 2 
1 1 3 1 tob 3al 
— (o 1 o) — k 1 1 o) 
0 1 2 2 0 0 I1 2 


1 1 0 —5 1 0 0 —3 
— (o 1 0 -2) — (o 1 0 -2). 
001 2 0 0 1 2 


Activity 2.20 Work through the above calculation and state what row 
operation is being done at each stage. For example, the first operation 
is Ro — Ry. 


— 
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Now we write down the solution from the reduced row echelon form 
of the matrix. (Remember that this is the reduced row echelon of the 
coefficient matrix A, representing the homogeneous system Ax = 0.) 


The solution is 


x 3 

S S A 2 
x= s =t 9] teR, 

w 1 


which is a line through the origin, x = tv, with v = (3, 2, —2, 1)". There 
are infinitely many solutions, one for every ¢ € R. 


This example illustrates the following fact. 


Theorem 2.21 /f A is anm x n matrix with m < n, then Ax = 0 has 
infinitely many solutions. 


Proof. The system is always consistent (because it is homogeneous) 
and the solutions are found by reducing the coefficient matrix A. If A 
is m x n, then the reduced row echelon form of A contains at most 
m leading ones, so there are at most m leading variables. Therefore, 
there must be n — m non-leading variables. Since m <n, n —m > 0, 
which means n — m > 1. This says that there is at least one non- 
leading variable. So the solution involves at least one arbitrary param- 
eter which can take on any real value. Hence, there are infinitely many 
solutions. 


What about a linear system Ax = b? If A is m x n with m < n, does 
Ax = b have infinitely many solutions? The answer is: provided the 
system is consistent, then there are infinitely many solutions. So the 
system either has no solutions, or infinitely many. The following exam- 
ples demonstrate both possibilities. 


Example 2.22 The linear system 


x+y+z=6 
x+y+z=l1l 


is inconsistent, since there are no values of x, y,z which can satisfy 
both equations. These equations represent parallel planes in R3. 


Example 2.23 On the other hand, consider the system 
x+y+3z+w=2 
x-ytz+w=4 
y+2z+2w =0. 
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We will show that this is consistent and has infinitely many solutions. 
Notice that the coefficient matrix of this linear system is the same matrix 
A as that used in Example 2.19. 

The augmented matrix is 


t 1 312 
am= (1 =i 1 1 +). 
0O 1 220 


Activity 2.24 Show that the reduced row echelon form of the aug- 
mented matrix is 
1 0 0 -3 1 
(o 1 0 -2 -2) . 
00 1 2 1 


Then write down the solution. 


The general solution of this system, 


x 1 3 
y —2 
Z 
w 


= +t 


A 


9 =p+t+tv te 


1 


is a line which does not go through the origin. It is parallel to the line 
of solutions of the homogeneous system, Ax = 0, and goes through 
the point determined by p. This should come as no surprise, since the 
coefficient matrix forms the first four columns of the augmented matrix. 
Compare the solution sets: 


Ax=0: Ax=b: 
RREF(A) RREF(4|b) 
100 -3 100-3 1 
(o 1 0 -2) (o 1 0 -2 2] 
001 2 001 2 1 
3 1 3 
seh 2 a aa © 
aun Go ce i -3 
1 0 1 


The reduced row echelon form of the augmented matrix of a system 
Ax = b will always contain the information needed to solve Ax = 0, 
since the matrix A is the first part of (A|b). We therefore have the 
following definition. 
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Definition 2.25 (Associated homogeneous system) Given a system 
of linear equations, Æx = b, the linear system Ax = 0 is called the 
associated homogeneous system. 


The solutions of the associated homogeneous system form an important 
part of the solution of the system Ax = b, as we shall see in the next 
section. 


Activity 2.26 Look at the reduced row echelon form of A in 
Example 2.19, 
1 0 0 -3 
(o 1 0 =) ‘ 
00 1 2 


Explain why you can tell from this matrix that for all b € R°, the linear 
system Ax = b is consistent with infinitely many solutions. 


Activity 2.27 Solve the system of equations Ax = b given by 


xı ee + x3 = 1 
2x1 + 2x =2 
3x, + 4x2 + x3 = 2. 


Find also the general solution of the associated homogeneous system, 
Ax = 0. Describe the configuration of intersecting planes for each sys- 
tem of equations (Ax = b and Ax = 0). 


2.4.2 Null space 


It is clear from what we have just seen that the general solution to 
a consistent linear system Ax = b involves solutions to the system 
Ax = 0. This set of solutions is given a special name: the null space or 
kernel of the matrix A. This null space, denoted N(A), is the set of all 
solutions x to Ax = 0, where 0 is the zero vector. That is: 


Definition 2.28 (Null space) For an m x n matrix A, the null space of 
A is the subset of R” given by 


N(A) = {x € R" | Ax = 9}, 


where 0 = (0, 0, ..., 0)" is the zero vector of R”. 


We now formalise the connection between the solution set of a consistent 
linear system, and the null space of the coefficient matrix of the system. 
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Theorem 2.29 Suppose that A is an m x n matrix, that b € R” and 
that the system Ax = b is consistent. Suppose that p is any solution of 
Ax = b. Then the set of all solutions of Ax = b consists precisely of 
the vectors p + z for z € N(A), that is, 


{x | Ax = b} = {p+z|ze N(A)}. 


Proof: To show the two sets are equal, we show that each is a subset 
of the other. This means showing that p + z is a solution for any z in 
the null space of A, and that all solutions, x, of Ax = b are of the form 
p +z for some z € N(A). 

We start with p + z. If z € N(A), then 


A(p +z) = Ap+ Az=b+0=b, 


so p +z is a solution of Ax = b; that is, p + z € {x | dx = b}. This 
shows that 


{p+z|zeN(A)} c {x| Ax =D}. 


Conversely, suppose that x is any solution of Ax = b. Because p is also 
a solution, we have Ap = b and 


A(x — p) = Ax — Ap=b-—-b=0, 


so the vector z = x — p is a solution of the system Az = 0; in other 
words, z € N(A). But then x = p+ z, where z € N(A). This shows 
that all solutions are of the form, p + z for some z € N(A); that is, 


{x | Ax =b} C{p+z|zeN(A)}. 


So the two sets are equal, as required. 


The above result is the ‘Principle of Linearity’. It says that the gen- 
eral solution of a consistent linear system Ax = b is equal to any one 
particular solution p (where Ap = b) plus the general solution of the 
associated homogeneous system. 


{solutions of Ax = b} = p + {solutions of Ax = 0}. 


In light of this result, let’s have another look at some of the examples 
we worked earlier. In Example 2.23, we observed that the solutions of 


x+y+3z+w=2 
x-yt+zt+w=4 
y+2z+2w=0. 
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are of the form 


x 1 3 
y|_|-2 2) _ 

c=) = 1 +t > =p+tv, teER, 
w 0 1 


where x = fv is the general solution we had found of the associated 
homogeneous system (in Example 2.19). It is clear that p is a particular 
solution of the linear system (take t = 0), so this solution is of the form 
described in the theorem. 

Now refer back to the first two examples Ax = b and Bx = b, 
which we worked through in Section 2.3.1. (For convenience, we’ll call 
the variables x, y, z rather than x1, x2, x3.) 


x+y+z =3 27- = 3 
{Pty be os [astia 
x—y+2z=5 Z =S: 
The echelon forms of the augmented matrices we found were 
100 1 01 2 2 
cay — (0 1 0 o). am — (0 0 1 s). 
0 0 1 2 0 0 0 1 


The first system, Ax = b, has a unique solution, p = (1, 0, 2)', and the 
second system, Bx = b, is inconsistent. 

The reduced row echelon form of the matrix A is the identity matrix 
(formed from the first three columns of the reduced augmented matrix). 
Therefore, the homogeneous system 4x = 0 will only have the trivial 
solution. The unique solution of Ax = b is of the form x = p + 0, 
which conforms with the Principle of Linearity. 

This principle does not apply to the inconsistent system Bx = b. 
However, the associated homogeneous system is consistent. Notice that 
the homogeneous system is 


2z =0 
[ayta 0 


Z = 0. 


which represents the intersection of two planes, since the equations 
2z = 0 and z = 0 each represent the xy-plane. To find the solution, we 
continue to reduce the matrix B to reduced row echelon form. 


0 1 3 0 1 0 
e—(a0i)—(o oa); 
0 0 0 00 0 
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The non-leading variable is x, so we set x = t, and the solution is 


-O 


which is a line through the origin; namely, the x axis. So the plane 
2y + 3z = 0 intersects the xy-plane along the x axis. 
We summarise what we have noticed so far: 


a 


e If Ax = b is consistent, the solutions are of the form x = p + z, 
where p is any one particular solution and z € N(A) the null space 
of A. 
If Ax = b has a unique solution, then Ax = 0 has only the trivial 
solution. 
If Ax =b has infinitely many solutions, then Ax = 0 has 
infinitely many solutions. 
e Ax = b may be inconsistent, but Ax = 0 is always consistent. 


Activity 2.30 Look at the example we solved in Section 2.3.3 on 
page 71. 
xı + x2 + x3 + x4 + x5 =3 
2x1 + x2 + x3 + x4 + 2x5 = 4 
xı — x2 — X3 + x4 + x5 =5 


xı + x4 + x5 = 4. 


Show that the solution we found is of the form 


x=p+sv+tw,s,tEeR, 


where p is a particular solution of 4x = b and sv + tw is a general 
solution of the associated homogeneous system Ax = 0. 


2.5 Learning outcomes 


You should now be able to: 


e express a system of linear equations in matrix form as Ax = b and 
know what is meant by the coefficient matrix and the augmented 
matrix 

e put a matrix into reduced row echelon form using row operations 
and following the algorithm 

e recognise consistent and inconsistent systems of equations by the 
row echelon form of the augmented matrix 
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e solve a system of m linear equations in n unknowns using Gaussian 
elimination 

e express the solution in vector form 

e interpret systems with three unknowns as intersections of planes in 

R3 

e say what is meant by a homogeneous system of equations and what 
is meant by the associated homogeneous system of any linear system 
of equations 

e explain why the solution of a consistent system of linear equations 
Ax = bis the sum of a particular solution and a general solution of 
the associated homogeneous system 

e say what is meant by the null space of a matrix. 


2.6 Comments on activities 


Activity 2.13 Put the augmented matrix into reduced row echelon form. 
It should take five steps: 


1 1 1 6 
(2 4 1 s) > (1) — (2) — 8) — (4) 


2 3 1 6 
100 2 
—(o 10 a), 
001 5 


from which you can read the solution, x = (2, —1, 5)". We will state the 
row operations at each stage. To obtain (1), do Ry — 2R; and R3 — 2 R;; 
for (2) switch R, and R3; for (3) do R3 — 2R2. The augmented matrix 
is now in row echelon form, so starting with the bottom row, for (4), 
do R + R; and R; — R3. The final operation, R — R2, will yield the 
matrix in reduced row echelon form. 


Activity 2.15 Multiply the matrices below as instructed to obtain b: 
1 


kE 1d as 3 
2 1 1 12 4 
S wail tt Al A : = 1-5 13 
1 0 0 4 4 i 4 
Lt 1 i a = 3 
w L 2 ae l [4 
XSi -1 -1 11 al 
i 0 0 11 ; 4 
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Activity 2.16 The equations are: 


Maes Pg =3 
2x, + 2x2 =2 
3x, + 4x. +43 = 5. 


Put the augmented matrix into reduced row echelon form: 


12 1 3 L 2 1 3 
k 2 0 2) — (o —2 -2 a4) 
3 4 1 5 0 —2 -—2 —4 
12 1 3 1 0-1 -1 
= (0 1 1 2) (0 1 1 J 
00 0 0 0 0 0 0 
So we have solution 
x —l+t —l 1 
PER 
z t 0 1 


for t € R. This is the equation of a line in R°. So the three planes 
intersect in a line. 


Activity 2.26 This is the reduced row echelon form of the coefficient 
matrix, A. The reduced row echelon form of any augmented matrix, 
(A|b), will have as its first four columns the same four columns. As 
there is a leading one in every row, it is impossible to have a row of the 
form (0 0 ... 0 1), so the system will be consistent. There will be one 
free (non-leading) variable, (fourth column, say x4 = t), so there will 
be infinitely many solutions. 


Activity 2.27 Using row operations to reduce the augmented matrix to 
echelon form, we obtain 


1 2 1 1 1 2 1 1 
k 2 0 2 > (o —2 -2 0 ) 
3 4 1 2 0 —2 -2 -l1 


1 2 1 1 12 1 1 
> (0 1 1 o) > (0 1 1 0). 
0 -2 -2 -l 0 0 0 -I 


There is no reason to reduce the matrix further, for we can now conclude 
that the original system of equations is inconsistent: there is no solution. 
For the homogeneous system, Ax = 0, the row echelon form of ÆA 
consists of the first three columns of the echelon form of the augmented 


www. lTechnicalBooksPDF.com 


84 Systems of linear equations 


matrix. So starting from these and continuing to reduced row echelon 
form, we obtain 


1 2 1 | eee | 1 0 1 
a=(2 2 o) = (0 1 1) > (0 1 a 
3 4 1 0 0 0 0 0 0 


Setting the non-leading variable x3 to x3 = t, we find that the null space 
of A consists of all vectors, x, of the following form: 


1 
=(=). te 
1 


The system of equations Ax = 0 has infinitely many solutions. 

Geometrically, the associated homogeneous system represents the 
equations of three planes, all of which pass through the origin. These 
planes intersect in a line through the origin. The equation of this line is 
given by the solution we found. 

The original system represents three planes with no common points 
of intersection. No two of the planes in either system are parallel. Why? 
Look at the normals to the planes: no two of these are parallel, so 
no two planes are parallel. These planes intersect to form a kind of 
triangular prism; any two planes intersect in a line, and the three lines 
of intersection are parallel, but there are no points which lie on all three 
planes. (If you have trouble visualising this, take three cards, place one 
flat on the table, and then get the other two to balance on top, forming 
a triangle when viewed from the side.) 


G 


2.7 Exercises 


Exercise 2.1 Write down the augmented matrix for each of the follow- 
ing systems of equations, and use it to solve the system by reducing the 
augmented matrix to reduced row echelon form. 


x—y+z=-—3 
(a) —3x + 4y -z =2 
x—3y—2z=7 
2x —y+3z=4 
b) 2 xty-z=1 
5x +2y=7. 


Interpret the solutions to each of the above systems as intersections of 
planes, describing them geometrically. 
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Exercise 2.2 Solve each of the following systems of equations. 
—-x+y—3z=0 
(a) 3x —2y + 10z = 0 
—2x +3y—5z=0 
—x+y—3z=6 
(b) 3x — 2y + 10z = —10 
—2x + 3y — 5z = 9. 


Exercise 2.3 Find the vector equation of the line of intersection of the 
planes 


3x, + x2 + x3 = 3 and Xp —X.—x3= 1. 
What is the intersection of these two planes and the plane 
x, + 2x2 + 2x3 = 1? 


Exercise 2.4 Solve the system of equations Ax = b, where 


2 3 1 | 4 
a=(1 2 0 =i); »=(1) 
3 4 2 4 9 


using Gaussian elimination. (Put the augmented matrix into reduced 
row echelon form.) Express your solution in vector form as x = p + tV, 
where ¢ is a real number. Check your solution by calculating Ap and 
AV. 

Write down the reduced row echelon form of the matrix A. Refer- 
ring to this reduced row echelon form, answer the following two ques- 
tions and justify each answer. 


(i) Is there a vector d € R? for which Ax = d is inconsistent? 
(ii) Is there a vector d € R? for which Ax = d has a unique solution? 


Exercise 2.5 Find the reduced row echelon form of the matrix 


1 2 -l1 3 8 
C= (= -1 8 6 1 ) ; 
=p 0 Az E 2 


(a) IfC is the augmented matrix of a system of equations Ax = b, 
C = (A|b), what are the solutions? What Euclidean space are they 
in? 

(b) If C is the coefficient matrix of a homogeneous system of equa- 
tions, Cx = 0, what are the solutions? What Euclidean space are 
they in? 

(c) Let w=(1,0,1, 1,1)". Find d such that Cw = d. Then write 
down a general solution of Cx = d. 
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Exercise 2.6 Find the null space of the matrix 
1 3 
4 1.2 
1 5 
3 =5 1 


Let c1, c2, c3 denote the columns of B. Find d = cı + 2c) — c3. Then 
write down all solutions of Bx = d. 


2.8 Problems 


Problem 2.1 Write down the augmented matrix for each of the follow- 
ing systems of equations, and use it to solve the system by reducing the 
augmented matrix to reduced row echelon form. 


x+y+z=2 
(a) 2y+z=0 
x+y- z= —4. 
x+y+2z=2 
(b) 2y+z=0 
—x+y—z=0. 
x+y+2z=2 
(c) 2y+z=0 
x+y-—z=-2. 
—3x —y+z=0 


(d) —2x + 3y+2z=0 
x+2y+3z=0. 


Interpret the solutions to each of the above systems as intersections of 
planes, describing them geometrically. 


Problem 2.2 Find the general solution of the following system of linear 
equations using Gaussian elimination. 


Mae ay +xs = 1 
3x1 + 3x2 + 6x3 + 3x4 + 9x5 = 6 
2x1 + 2x2 + 4x3 + x4 + 6x5 = 5. 


The general solution should be expressed in vector form. 
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Problem 2.3 Express the following system of equations in matrix form: 
—5x +y — 3z — 8w =3 
3x +2y +2z + 5w =3 
x+z+2w=-1. 
Show that the system is consistent and find the general solution. 


Then write down the general solution of the associated homoge- 
neous system of equations. 


Problem 2.4 Given that the matrix below is in reduced row echelon 
form, find the missing entries (as indicated by *). Replace every * which 
has to be a 0 with a 0. Replace every * which has to be a 1 with a 1. 
Replace every * which does not have to be either a 0 or a 1 with a 2. 


koko ko ko Ox 
gl l x x A). 
xo * x —4 3 


If C is the reduced row echelon form of the augmented matrix of a 
system of linear equations, Ax = b, then write down the solution of the 
system in vector form. 

If C is the reduced row echelon form of a matrix B, write down the 
general solution of Bx = 0. 


Problem 2.5 Consider the following matrices and vector: 


3 1 -l 1 0 5 3 2 11 
(1 1 o). s=( 0 2422), »=(2). 

2 ile 2 -1 55 01 —6 
Solve each of the systems Ax = b and Bx = b using Gaussian elimi- 


nation, and express your solution in vector form. 


Problem 2.6 Put the matrix 


— = = Re 
NUUR 
NNUA 


into reduced row echelon form. 


(a) The homogeneous system of equations Bx = 0 represents how 
many equations in how many unknowns? Is there a non-trivial 
solution? If so, find the general solution of Bx = 0. 

(b) Is there a vector b € R4 for which Bx = b is inconsistent? Write 
down such a vector b if one exists and verify that Bx = b is 
inconsistent. 
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(c) Write down a vector d € R4 for which Bx = d is consistent. Then 
write down the general solution of Bx = d. 


Problem 2.7 Let 


4 -1 1 x 
a=(-1 4 =) x= (>). 
1 -l 4 z 
Write out the system of linear equations Ax = 6x and find all its solu- 


tions. 


Problem 2.8 Let 


1 0 1 a 
0 1 1I b 
B -1 0 3?’ e= c 
3 1 2 d 


Find an equation which the components a, b, c,d ofthe vector b must 
satisfy for the system of equations Bx = b to be consistent. 

If Bx =b is consistent for a given vector b, will the solution be 
unique? Justify your answer. 


Problem 2.9 (Portfolio theory) A portfolio is a row vector 


LS ai Ym) 


in which y; is the number of units of asset 7 held by an investor. After a 
year, say, the value of the assets will increase (or decrease) by a certain 
percentage. The change in each asset depends on states the economy 
will assume, predicted as a returns matrix, R = (rij), where r;; is the 
factor by which investment 7 changes in one year if state 7 occurs. 

Suppose an investor has assets in yı = land, y = bonds and 
y3 = stocks, and that the returns matrix is 


1.05 0.95 1.0 
R= (1s 1.05 Las) , 
1.20 1.26 1.23 


Then the total values of the portfolio in one year’s time are given by 
Y R, where (Y R); is the total value of the portfolio if state j occurs. 


(a) Find the total values of the portfolio W = (5000 2000 0) in 
one year for each of the possible states. 

(b) Showthat U = (600 8000 1000) isa riskless portfolio; that 
is, it has the same value in all states j. 
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An arbitrage portfolio, Y = (yı ... 3m) is one which costs 
nothing (yı +: + Ym = 0), cannot lose ((Y R); = 0 for all j), 
and in at least one state makes a profit ((Y R); > 0 for some j). 

Show that Z = ( 1000 —2000 1000) is an arbitrage portfolio. 


(The bond asset of —2000 indicates that this sum was borrowed 


from the bank.) 

Can you find a more profitable arbitrage vector than this one? 
Let R bean m x n returns matrix, and letu = (1,1,..., 1)’ € R”. 
If the system Rx =u has a solution p=(p1,..., Pm)! with 


pi > 0, then the components p; of p are called state prices and 
the investor is said to be taking part in a ‘fair game’. If state prices 
exist, then there are no arbitrage vectors for R, and if state prices 
do not exist, then arbitrage vectors do exist. Show that state prices 
for the given matrix, R, do not exist. 


Problem 2.10 (Conditioning of matrices) Some systems of linear 
equations lead to ‘ill-conditioned’ matrices. These occur if a small 
difference in the coefficients or constants yields a large difference in the 
solution, in particular when the numbers involved are decimal approx- 
imations of varying degree of accuracy. 


The following two systems of equations represent the same problem, 


the first to two decimal places accuracy, the second to three decimal 
places accuracy. Solve them using Gaussian elimination and note the 
significant difference in the solutions you obtain. 


e + y = 51.11 e + y = 51.106 


x + 1.02y = 2.22 


x + 1.0l6y = 2.218. 


www. TechnicalBooksPDF.com 


3 


Matrix inversion and 
determinants 


In this chapter, all matrices will be square n x n matrices, unless explic- 
itly stated otherwise. Only a square matrix can have an inverse, and the 
determinant is only defined for a square matrix. 

We want to answer the following two questions: When is a matrix 
A invertible? How can we find the inverse matrix? 


3.1 Matrix inverse using row operations 
3.1.1 Elementary matrices 


Recall the three elementary row operations: 


RO1 multiply a row by a non-zero constant. 
RO2 interchange two rows. 
RO3 add a multiple of one row to another. 


These operations change a matrix into a new matrix. We want to exam- 
ine this process more closely. Let A be an n x n matrix and let 4; 
denote the ith row of A. Then we can write A as a column of n 
rows, 


di) a2 +++ Ain A, 

d2) an +++ An Ay 
A= . . = 

dni An2 °*** Am An 
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We use this row notation to indicate row operations. For example, what 
row operations are indicated below? 


Ay A2 Ay 
3A> Ay A,+ 4A, 
As An An 


The first is multiply row 2 by 3, the second is interchange row 1 and row 
2, and the third is add 4 times row 1 to row 2. Each of these represents 
new matrices after the row operation has been executed. 

Now look at a product of two n x n matrices A and B. The (1, 1) 
entry in the product is the inner product of row 1 of A and column | of 
B. The (1, 2) entry is the inner product of row 1 of A and column 2 of 
B, and so on. In fact, row 1 of the product matrix AB is obtained by 
taking the product of the row A, with the matrix B; that is, A; B. This 
is true of each row of the product; that is, each row i of the product AB 
is obtained by calculating 4; B. So we can express the product AB as 


aii a2 +++ din biy biz +- bin A\B 
az) an +++ Ary bo, bn ++ ban AB 
an) an2 >t Ann bni bn2 aie. bnn A,B 


Now consider the effect of a row operation on a product AB. The first 
matrix below is the product AB after the row operation ‘add 4 times 
row | of AB to row 2 of AB’ 


A,B A,B Ay 
AB + 44ıB (42 +441)B AÁ +441 
A,B A,B A, 


In the second matrix, we have used the distributive rule to write 
AyB + 44ıB =(A4A2.+44))B. 


But compare this matrix to the row form of a product of two matrices 
given above. You can see that it is what would result if we took the 
matrix obtained from A after the same row operation, and multiplied 
that by B. 

We have shown that the matrix obtained by the row operation, ‘add 
4 times row | to row 2’ on the product 4B is equal to the product of 
the matrix obtained by the same row operation on A, with the matrix 
B. The same argument works for any row operation in general, so 
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that 


(matrix obtained by a row operation on AB) 
= (matrix obtained by a row operation on A)B. 


This is true for any n x n matrices A and B. 
Now take A = J, the identity matrix. Since ZB = B, the previous 
statement now says that: 


e The matrix obtained by a row operation on B = (the matrix obtained 
by a row operation on /)B. 


This leads us to the following definition: 


Definition 3.1 (Elementary matrix) An elementary matrix, E, is an 
n x n matrix obtained by doing exactly one row operation on then x n 
identity matrix, J. 


For example, 


1 0 0 0 1 0 1 0 0 
(o 3 o) ; ( 0 o) ; [: 1 o) 
0 0 1 0 0 1 0 0 1 


are elementary matrices. The first has had row 2 multiplied by 3, the 
second has had row 1 and row 2 interchanged, and the last matrix has 
had 4 times row 1 added to row 2. 


Activity 3.2 Which of the matrices below are elementary matrices? 


2 10 0 1 0 1 0 0 
(o 1 o). | 1 0 o). | 0 1 o). 
0 0 1 -1 0 1 -1 0 1 


Write the first matrix as the product of two elementary matrices. 


Elementary matrices provide a useful tool to relate a matrix to its 
reduced row echelon form. We have shown above that the matrix 
obtained from a matrix B after performing one row operation is equal 
to a product E B, where E is the elementary matrix obtained from J by 
that same row operation. 

For example, suppose we want to put the matrix 


1 2 4 
s=( i 3 s) 
—1 0 1 
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into reduced row echelon form. Our first step is 


1 24 124 
aE 3 s) ee (o 1 2). 
-1 0 1 -1 0 1 


We perform the same operation on the identity matrix to obtain an 
elementary matrix, which we will denote by E 


100 1 0 0 
r= (0 1 o) a (= 1 0) = 2. 
0 0 1 0 0 1 


Then the matrix E; B is 


1 0 0 1 2 4 1 2 4 
ee (~i 1 o) (1 3 sJ=(0 1 2). 
0 0 1I —1 0 1 —1 0 1 


which is the matrix obtained from B after the row operation. 

We now want to look at the invertibility of elementary matrices and 
row operations. First, note that any elementary row operation can be 
undone by an elementary row operation. 


RO1 is multiply a row by a non-zero constant. 
To undo RO1, multiply the row by 1/(constant). 
RO2 is interchange two rows. 
To undo RO2, interchange the rows again. 
RO3 is adda multiple of one row to another. 
To undo RO3, subtract the multiple of one row from the other. 


If we obtain an elementary matrix by performing one row operation 
on the identity, and another elementary matrix from the row operation 
which ‘undoes’ it, then multiplying these matrices together will return 
the identity matrix. That is, they are inverses of one another. This 
argument establishes the following theorem: 


Theorem 3.3 Any elementary matrix is invertible, and the inverse is 
also an elementary matrix. 


1 0 0 
b= (-4 1 o). 
0 0 1 


Write down E~!. Then show that EET! = J and E!E = I. 


Activity 3.4 Let 
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We saw earlier in our example that multiplying EB we obtain 


1 0 0 1 2 4 1 2 4 
ne (~ 1 o) (1 3 sJ=(0 1 2). 
0 0 1 —1 0 1 —1 0 1 


We can undo this row operation and return the matrix B by multiplying 
on the left by E;': 


1 0 0 1 2 4 1 2 4 
(i 1 o) (o 1 a}= (1 3 5). 
0 0 1 —1 0 1 —1 0 1 


3.1.2 Row equivalence 


Definition 3.5 If 4 and B are m x n matrices, we say that A is row 
equivalent to B if and only if there is a sequence of elementary row 
operations to transform A into B. 


This is an example of what is known as an equivalence relation. This 
means it satisfies three important conditions; it is: 


e reflexive: 4d ~ A, 
e symmetric: AÁ ~ B > B~ A, 
e transitive: Á ~ B and B~ C> A~C. 


Activity 3.6 Argue why itis true that row equivalence is an equivalence 
relation; that is, explain why row equivalence as defined above satisfies 
these three conditions. 


The existence of an algorithm for putting a matrix A into reduced row 
echelon form by a sequence of row operations has the consequence that 
every matrix is row equivalent to a matrix in reduced row echelon form. 
This fact is stated in the following theorem. 


Theorem 3.7 Every matrix is row equivalent to a matrix in reduced 
row echelon form. 


3.1.3 The main theorem 
We are now ready to answer the first of our questions: ‘When is a matrix 
invertible?’ We collect our results in the following theorem. 


Theorem 3.8 /f A is ann x n matrix, then the following statements are 
equivalent (meaning if any one of these statements is true for A, then 
all the statements are true): 
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(1) A`! exists. 

(2) Ax = b has a unique solution for any b € R”. 
(3) Ax = 0 only has the trivial solution, x = 0. 
(4) The reduced row echelon form of A is I. 


Proof: If we show that (1) > (2) > (3) => (4) = (J), then any one 
statement will imply all the others, so the statements are equivalent. 


(1) => (2). We assume that A7! exists, and consider the system of 
linear equations Ax = b where x is the vector of unknowns and b is any 
vector in R”. We use the matrix A~! to solve for x by multiplying the 
equation on the left by 4~!. We have 


AT! Ax = Ab => Ix = A'b => x = A`! b. 


This shows that x = A~'b is the only possible solution; and it is a solu- 
tion, since A(A~'b) = (AA7!)b = Ib = b. So Ax = b has a unique 
solution for any b € R”. 


(2) => (3). If Ax = b has a unique solution for all b € R”, then this 
is true for b = 0. The unique solution of Ax = 0 must be the trivial 
solution, x = 0. 


(3) = > (4). If the only solution of Ax = 0 is x = 0, then there are no 
free (non-leading) variables and the reduced row echelon form of A 
must have a leading one in every column. Since the matrix is square 
and a leading one in a lower row is further to the right, 4 must have a 
leading one in every row. Since every column with a leading one has 
zeros elsewhere, this can only be the n x n identity matrix. 


(4) => (1). We now make use of elementary matrices. If A is row 
equivalent to /, then there is a sequence or row operations which reduce 
A to J, so there must exist elementary matrices E;,..., Æ, such that 


E,E,-1-::- Ey; A=. 


Each elementary matrix has an inverse. We use these to solve the above 
equation for A, by first multiplying the equation on the left by E7', 


then by E~',, and so on, to obtain 


A=Ej!..- EZET. 


This says that A is a product of invertible matrices, hence invertible. 

(Recall from Chapter 1 that if A and B are invertible matrices of the 

same size, then the product A B is invertible and its inverse is the product 

of the inverses in the reverse order, (4B)~! = B~!A7!)) 
This proves the theorem. 
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3.1.4 Using row operations to find the inverse matrix 


From the proof of Theorem 3.8, we have 
A=E;}..- Ez}, 


where the matrices £; are the elementary matrices corresponding to the 
row operations used to reduce A to the identity matrix, 7. Then, taking 
the inverse of both sides, 


AT! = (E7! ED)! =E,- E= E,- El. 


This tells us that if we apply the same row operations to the matrix 7 
that we use to reduce A to J, then we will obtain the matrix 4~!. That 
is, if 


E,E,-1++: EA =], 
then 
AT=E,--- Eyl. 


This gives us a method to find the inverse of a matrix A. We start with 
the matrix A and we form a new, larger matrix by placing the identity 
matrix to the right of A, obtaining the matrix denoted (A|/). We then 
use row operations to reduce this to (/|B). If this is not possible (which 
will become apparent) then the matrix is not invertible. If it can be done, 
then A is invertible and B = A7!. 


Example 3.9 We use this method to find the inverse of the matrix 


1 2 4 
=| 3 5). 
—1 0 1 


In order to determine if the matrix is invertible and, if so, to determine 
the inverse, we form the matrix 


1 24100 
an= 3 so 1 o). 
-1 0 10 0 1 


(We have separated A from / by a vertical line just to emphasise how 
this matrix is formed. It is also helpful in the calculations.) Then we 
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carry out elementary row operations. 


124100 
ae (o 1 | =i 1 o) 
oo 025! 1 01 

( 24,1 0 o) 
0 1 | = 1 0 
R= ees 00 113 -2 1 
1 2 0,=11 8 —4 
o (o 1 0| -7 5 =) 
001! 3 2 1 
1 00,3 2 0 
R= aR (o 1 0|-7 5 =), 
001!3 2 1 


This is now in the form (/|B), so we deduce that A is invertible and 


that 
3 —2 0 
A! = (= 5 =). 
3 -2 1 


It is very easy to make mistakes when row reducing a matrix, so the 
next thing you should do is check that AAT! = I. 


Activity 3.10 Do this. Check that when you multiply A4A~!, you get the 
identity matrix /. (In order to establish that this is the inverse matrix, 
you should also show A~!A = J, but we will forgo that here. We’ll 
come back to this issue shortly.) 


If the matrix A is not invertible, what will happen? By Theorem 3.8, if 
A is not invertible, then the reduced row echelon form of A cannot be 
I, so there will be a row of zeros in the row echelon form of A. 


Activity 3.11 Find the inverse, if it exists, of each of the following 
matrices 


3.1.5 Verifying an inverse 
At this stage, in order to show that a square matrix B is the inverse 


of the n x n matrix A, it seems we have to show that both statements, 
AB = I and BA = 1, are true. However, after we have proved the 
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following theorem (which follows from Theorem 3.8), we will be able 
to deduce from the single statement 4B = I that A and B must be 
inverses of one another. 


Theorem 3.12 /f A and B are n x n matrices and AB = I, then A 
and B are each invertible matrices, and A = B~! and B = A7!. 


Proof. If we show that the homogeneous system of equations Bx = 0 
has only the trivial solution, x = 0, then by Theorem 3.8 this will prove 
that B is invertible. So we consider the matrix equation Bx = 0 and 
multiply both sides of this equation on the left by the matrix 4. We have 


Bx = 0 => A(Bx) = 40 => (AB) = 0. 


But we are given that AB = J, so that 


(AB)x = 0 => Ix = 0 = x = 0, 


which shows that the only solution of Bx = 0 is the trivial solution. We 
therefore conclude that B is invertible, so the matrix B7! exists. 

We now multiply both sides of the equation AB = Z on the right 
by the matrix B~!. We have 


AB =I => (AB)B'=IB' = A(BB')=B' = A = B. 


So A is the inverse of B, and therefore A is also an invertible matrix. 
Then taking inverses of both sides of the last equation, we conclude that 
A`! = (B-t =B. 


3.2 Determinants 
3.2.1 Determinant using cofactors 


The determinant of a square matrix A is a particular number associated 
with A, written |4| or det A. This number will provide a quick way to 
determine whether or not a matrix A is invertible. In view of this, sup- 
pose A is a2 x 2 matrix, and that we wish to determine AT! using row 
operations. Then we form the matrix (A | /) and attempt to row reduce 
Ato I. We assume a Æ 0, otherwise we would begin by switching rows: 


_fa b|1 O\aæR/1 b/a|l/a 0 
ain= (2 ae 4 = (! ad | 0 l 
Ra—cRi 1 b/a l/a 0 aR 1 b/a l/a 0 
0 d-—cb/a | —c/a 1 0 (ad—bc)| —c a}’ 


which shows that A~! exists if and only if ad — bc £ 0. 
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For a 2 x 2 matrix, the determinant is given by the formula 


ae 


Note the vertical bars used in the notation for the determinant ofa matrix 
(and note also that we usually dispense with the large parentheses around 
the matrix when we write its determinant). 

For example, 


a 
Cc 


b 
A = ad — be. 


5 1 =00-00=-2 


To extend this definition ton x n matrices, we define the determinant of 
ann x n matrix recursively in terms of (n — 1) x (n — 1) determinants. 
So the determinant ofa 3 x 3 matrix is given in terms of 2 x 2 matrices, 
and so on. To do this, we will need the following two definitions. 


Definition 3.13 Suppose 4 is an n x n matrix. The (i, j) minor of 
A, denoted by M;;, is the determinant of the (n — 1) x (n — 1) matrix 
obtained by removing the ith row and jth column of A. 


Definition 3.14 The (i, j) cofactor of a matrix A is 
Ci; = (—1}*™ My. 


So the cofactor is equal to the minor if i + j is even, and it is equal to 
the negative of the minor if i + j is odd. 


I7 “2 3 
a=(4 1 i) 
= iS ND 


Then the minor M3 and the cofactor C23 are 
1 2 
-1 3 


Example 3.15 Let 


My = | =5, Co3 = (1) My = —5. 

There is a simple way to associate the cofactor C;; with the entry a;; of 
the matrix. Locate the entry a;; and cross out the row and the column 
containing a;;. Then evaluate the determinant of the (n — 1) x (n — 1) 
matrix which remains. This is the minor, M;;. Then give it a ‘+’ or ‘“—’ 
sign according to the position of a;; on the following pattern: 


+- +- 
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Activity 3.16 Write down the cofactor C13 for the matrix A above using 
this method. 


If A is ann x n matrix, the determinant of A is given by 


dil d12 see Aln 
d21 d2 <... Mn 

|AJ=] . = & © | =anCi + ay2Ci2 + +++ + ainCin. 
dn1 dn2 ... Ann 


This is called the cofactor expansion of |A| by row one. It is a recursive 
definition, meaning that the determinant of an n x n matrix is given in 
terms of some (n — 1) x (n — 1) determinants. 


Example 3.17 We calculate the determinant of the matrix A in 
Example 3.15: 


|A| = 1Cy, + 2Cy2 +3C13 
1 1 4 1 4 
=1|; aeai aa 
siej- aA 


| 
3 
Activity 3.18 Calculate the determinant of the matrix 


-1 2 1 
“=(% 23), 
1 1 4 


You might ask: ‘Why is the cofactor expansion given by row 1, rather 
than any other row?’ In fact, it turns out that using a cofactor expansion 
by any row or column of A will give the same number |A|, as the 
following theorem states. 


Theorem 3.19 /f A is ann x n matrix, then the determinant of A can 
be computed by multiplying the entries of any row (or column) by their 
cofactors and summing the resulting products: 


|A| = ai1Ci1 + aj2Ci2 +... + GinCin 
(cofactor expansion by row i) 

JA] = aijCij + az;jC2j +... + AnjOn; 
(cofactor expansion by column j). 


We will look into the proof of this result later, but first note that this 
allows you to choose any row or any column of a matrix to find its 
determinant using a cofactor expansion. So we should choose a row or 
column which gives the simplest calculations. 

Obtaining the correct value for |4| is important, so it is a good idea 
to check your result by calculating the determinant by another row or 
column. 


www. lTechnicalBooksPDF.com 


3.2 Determinants 101 


Example 3.20 In the matrix of Example 3.15, instead of using the 
cofactor expansion by row 1 as shown above, we can choose to evaluate 
the determinant of the matrix A by row 3 or column 3, which will involve 
fewer calculations since a33 = 0. To check the result |A| = 34, we will 
evaluate the determinant again; this time using column 3. Remember 
the correct cofactor signs: 


1 23 
l4J=|4 11 =a) NERI | +0=3013)- (5) = 34. 
“13 0 


Activity 3.21 Check your calculation of the determinant of the matrix 


-1 2 1 
“=(% 23] 
1 1 4 


in the previous activity by expanding by a different row or column. 
Choose one with fewer calculations. 


3.2.2 Determinant as a sum of elementary signed products 


We will give an informal proof of Theorem 3.19, because it is useful to 
understand how the definition of determinant works. This section can 
be safely omitted (meaning you can simply accept the theorem without 
proof and move on), but you might find it worth your while to read 
through it. 

For a2 x 2 matrix, the cofactor expansion by row 1 is equivalent to 
the definition given on page 99: 


ail ai2] 
= 411422 — 412421. 


a21) 422 


Notice that each term of the sum is a product of entries, one from each 
row and one from each column. Indeed, a1, is the entry from row 1 and 
column 1, and an is not in either: it comes from row 2 and column 2. 
Similarly, the second term, a12a21, is the only different way of taking 
one entry from each row and each column of the matrix. 

For a 3 x 3 matrix, the cofactor expansion by row | yields, 


4il i2 413 
421 A2 4A23|= ıı 
431 432 433 

= 41 1(422433— 423432) — 4 12(421433— 423431 )+413(A21432— 422431). 


a2) an 
431 432 


a22 A23 
432 433 


a21 423 
431 433 
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Then |4| is the sum of the products: 


411422433 +412423431 +413421432 Œ) 
— 411423432 —412421433 —413422431. 
The row indices of each product are in ascending order, 123, and the 
column indices are: 

123 231 312 

132 213 321. 


These are the six permutations of the numbers 1,2,3. 
Definition 3.22 A permutation of a set of integers {1,2,3,...,n}is 


an arrangement of these integers in some order with no omissions and 
no repetitions. 


To find all permutations of a set of numbers, we can use a permutation 
tree: 


1 2 3 < 3 choices 

AA 

23 13 12 < 2 choices 3-2-1=3]! 
Il oll od 

32 31 21 < 1I choice 


In the above expansion of |4|, each term has the row indices arranged 
in ascending order and the column indices form a different permutation 
of the numbers 1,2,3. We know, therefore, that each term of the sum 
is a different product of entries, one from each row and one from each 
column of A, and the set of six products contains all ways in which this 
can happen. 

But what about the minus signs? An inversion is said to occur in 
a permutation whenever a larger integer precedes a smaller one. For 
example, 


123 < no inversions 132 < one inversion. 


A permutation is said to be even if the total number of inversions is 
even. It is odd if the total number of inversions is odd. 

To find the total number of inversions of a permutation, we can 
start at the left and find the total number of integers to the right of the 
first integer which are smaller than the first integer. Then go to the next 
integer to the right and do the same. Continue until the end, and then 
add up all these numbers. 


Example 3.23 Consider the permutation 5 2 3 4 1. We apply the method 
just described. This tells us that total number of inversions is 


4+1+1+1=7, 
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so this permutation is odd. The total number of inversions gives the min- 
imum number of steps that it takes to put these numbers into ascending 
order, where in each step you are only allowed to switch the positions of 
two adjacent numbers. For the permutation 5 2 3 4 1, this can be done 
in seven steps as follows: 


52341 — 25341 — 23541 — 23451 
— 23415 — 23145 — 21345 — 12345. 


If we look again at the list of products (*), we find that the permutations 
of the column indices corresponding to the products with a plus sign 
are all even, and those corresponding to the products with a minus sign 
are all odd. 


Definition 3.24 An elementary product from an n x n matrix A is a 
product of n entries, no two of which come from the same row or 
column. A signed elementary product has the row indices arranged in 
ascending order, multiplied by —1 if the column indices are an odd 
permutation of the numbers 1 to n. 


We are now ready to give an intrinsic (but completely impractical) 
definition of determinant. 


Definition 3.25 (Determinant) The determinant of ann x n matrix A 
is the sum of all signed elementary products of A. 


A cofactor expansion is a clever way to obtain this sum of signed 
elementary products. You choose the entries from one row, say, and then 
cross out that row and the column containing the entry to obtain the 
cofactor, and each stage of calculating the cofactor repeats the process. 
At the heart of a proof of Theorem 3.19 is the fact that each possible 
cofactor expansion is the sum of all signed elementary products, and so 
all the cofactor expansions are equal to each other. 


Activity 3.26 Expand the determinant 


ái] 42) a3 
|A| =|a21 an az 


431 432 433 


using the cofactor expansion by column 2, and show that you get 
the same list of signed elementary products as we obtained in (*) on 
page 102. 
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For very large matrices, using a cofactor expansion is impractical. For 
example, 


1 -4 3 2 
2 -7 5 1 

l=]; > 6 0 = 1Ci1 + (—4)C12 + 3C13 + 2C14 
2 —10 14 4 


would require calculating four 3 x 3 determinants. Fortunately, there is 
a better way. To simplify the calculations, we will turn once again to 
row operations. But first we need to establish some useful results on 
determinants, which follow directly from Theorem 3.19. 


3.3 Results on determinants 


We now look at some standard and useful properties of determinants. 
Theorem 3.27 If A is ann x n matrix, then 
|A"| = |A]. 


Proof: This theorem follows immediately from Theorem 3.19. The 
cofactor expansion by row i of |A'| is precisely the same, number for 
number, as the cofactor expansion by column i of | A]. 


Each of the following three statements follows from Theorem 3.19. 
(They are ‘corollaries’, meaning consequences, of the theorem.) As a 
result of Theorem 3.27, it follows that each is true if the word row is 
replaced by column. We will need these results in the next section. In 
all of them, we assume that 4 is ann x n matrix. 


Corollary 3.28 [fa row of A consists entirely of zeros, then |A| = 0. 


Proof: If we evaluate the determinant by the cofactor expansion using 
the row of zeros, then each cofactor is multiplied by 0 and the sum will 
be zero. To visualise this, expand the determinant below using row 1: 


0 0... 0 
a2; A2 «+--+ An 
. y -|= 0C + 0Ci2+---+0C), = 0. 
dni Am2 ... Ann 


Corollary 3.29 If A contains two rows which are equal, then |A| = 0. 
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Proof: To prove this, we will use an inductive argument. If A isa2 x 2 
matrix with two equal rows, then 


|4| = 


a b 
3 | =ab— ab =0. 


Now consider a 3 x 3 matrix with two equal rows. If we expand the 
determinant by the other row, then each cofactor is a2 x 2 determinant 
with two equal rows, therefore each is zero and so is their sum. For 
example, 


= 


io 


a b 


a be 
|A|=|d e #|=-a| po 
a b c 
0 


Generally, in a similar way, the result for (n — 1) x (n — 1) matrices 
implies the result for n x n matrices. 


Corollary 3.30 /fthe cofactors of one row are multiplied by the entries 
of a different row and added, then the result is 0. That is, ifi # j, then 
ajiCy +aj2Ci2 +---+ajnCin = 0. 


Proof: Let 
411 412 Ain 
Ja a21 K a2n 
An) an2 ... Ann 


The cofactor expansion of |A| by row i is 
|A| = aj1Ciy + j2Ci2 + +++ + Gin Cin. 
Look at the expression 
aj Ci +a;j2Ci2 +--+» + ajnCin for ij. 


This expression is not equal to |4|, so what is it? It is equal to |B| for 
some matrix B, but what does the matrix B look like? 

In the expression |B| = aj)Cj) + aj2Ci2 ++-+++ajnCin, each 
cofactor Cix, for k =1,...,n, is made up of entries of the matrix 
A, omitting the entries from row i. For example, if i 4 1, then Cj, is 
obtained from the matrix resulting in removing row 7 and column 1 
from A, and Cix is obtained by removing row į and column k. So the 
matrix B will have the same entries as the matrix A except in row 7. 
In the cofactor expansion of a determinant by row i, the entries of row 
i are the numbers multiplying the cofactors. Therefore, the entries of 
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row i of the matrix B must be aj, ..., ajn. Then B has two equal rows, 
since row į has the same entries as row j. It follows, by Corollary 3.29, 
that |B| = 0, and the result follows. 


Corollary 3.31 Jf A = (a;;) and if each entry of one of the rows, say 
row i, can be expressed as a sum of two numbers, aij = bij + ci; for 
1 < j <n, then |A| = |B| + |C], where B is the matrix A with row i 
replaced by bii, bi2, +++ bin and C is the matrix A with row i replaced 
DY Cil, Ci2, © * * Cin- 


Proof. First, let’s illustrate this witha 3 x 3 matrix. The corollary states 
that, for example, 


a b c a b c a b c 
A|=|d+p e+q ftrj=|d e fit|p qr 
g h i g h i g h 
= |B| + |C]. 


To show this is true, you just need to use the cofactor expansion by 
row 2 for each of the determinants. We have 


|4| = (d + p)Cz1 + (e + 4)C22 + (f +r)Cz3 
= dC + eCa + fC23 + pCr1 + qC +r Cz 
= |B| + |C], 
where the cofactors C21, C22, C23 are exactly the same in each expansion 
(of the determinants of B and C), since each consists entirely of entries 
from the matrix A other than those in row i. 


The proof for a general matrix A is exactly the same. The cofactor 
expansion by row į yields 


|A| = aj Ci + aji2Ci2 + +++ + GinCin 
= (bi + ci) Ci + (biz + ci2)Ci2 + ++ + (bin + Cin)Cin 
= (bi Ci, + bi2Ci2 +--+ + DinCin) 
+ (ci1Ci1 + Ci2Ci2 ++ + CinCin) 
= |B| + [C]. 


3.3.1 Determinant using row operations 


In this section, we take a different approach to evaluating determinants, 
by making use of row operations. 
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Definition 3.32 An n x n matrix A is upper triangular if all entries 
below the main diagonal are zero. It is lower triangular if all entries 
above the main diagonal are zero. 


upper ai) A2 +++ Gin 
triangular O an ~... Arn 
matrix : : 
0 0 Ann 
dil 0 0 
lower an an ... 0 
triangular : : 
matrix Ani An2 «++ Ann 
dil 0 0 
, 0 ay 0 
diagonal 
matrix ' f m 
0 Q sei Ohh 


Suppose we wish to evaluate the determinant of an upper triangular 
matrix, such as 


ai, 412 oe» Aln 
0 a22 ... AD 
0 O ... Ann 


Which row or column should we use for the cofactor expansion? Clearly, 
the calculations are simplest if we expand by column 1 or row n. 
Expansion by column 1 gives us 


a2 ... An 
I4J=aun]: 7c. is 
O sss dan 
where the (n — 1) x (n — 1) matrix on the right is again upper triangular. 
Continuing in this way, we see that | A| is just the product of the diag- 
onal entries. The same argument holds true for a matrix which is diag- 


onal or lower triangular, so we have established one more corollary of 
Theorem 3.19: 


Corollary 3.33 If A is upper triangular, lower triangular or diagonal, 
then 


|A| = @11422°++ Ann. 
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A square matrix in row echelon form is upper triangular. If we know how 
a determinant is affected by a row operation, then this observation will 
give us an easier way to calculate large determinants. We can use row 
operations to put the matrix into row echelon form, keep track of any 
changes and then easily calculate the determinant of the reduced matrix. 
So how does each row operation affect the value of the determinant? 
Let’s consider each in turn. 
The first row operation is: 


RO1 multiply a row by a non-zero constant 


Suppose the matrix B is obtained from a matrix A by multiplying row 
i by a non-zero constant a. For example, 


dıl a\2 vax Aln dil a\2 ee Ain 

a2, A2 ... An a2; Qan ... Aln 
AS ek aae | BES ee 

Qn) An2 ... Ann ani an2 oct Ann 


If we evaluate |B| using the cofactor expansion by row i, we obtain 


|B| = aajyCi, + wajzCi2 + +++ + WAjnCin 
= a(aj Ci + ai2Ci2 + +++ + AinCin) 
Salal 


So we have: 


e The effect of multiplying a row of A by « is to multiply |A| by a, 
|B] =a Al. 


When we actually need this, we will use it to factor out a constant œ 
from the determinant as follows: 


dil di2 Tas din dil di2 wee Aln 

a2; Qd ... AaAry al An ... Ary 
7 , 3 =Q 

ani Qn2 Ar dnn dni Qn2 ... Ann 


The second type of row operation is: 
RO2 interchange two rows 


This time we will use an inductive proof involving the cofactor expan- 
sion. If A is a 2 x 2 matrix and B is the matrix obtained from A by 
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interchanging the two rows, then 


Cc 


au 
d a 


|| =|" 


= ad — be, |B| = 


d 
| = be - ad. 


so |B| = —|A]. 

Now let A be a3 x 3 matrix and let B be a matrix obtained from 
A by interchanging two rows. Then if we expand |B| using a different 
row, each cofactor contains the determinant of a 2 x 2 matrix which is 
a cofactor of A with two rows interchanged, so each will be multiplied 


by —1, and |B| = —| 4|. To visualise this, consider for example 
a b c g h i 
|AJ=|d e fl, |Bl=ld e f). 
g h i abe 
Expanding |A| and |B| by row 2, we have 
b c a c a b 
a -a); i ee g i = g h 
_ h i g i g h| 
Bi=-a4|; c TE a c ER b ree 


since all the 2 x 2 determinants change sign. In the same way, if this 
holds for (n — 1) x (n — 1) matrices, then it hold for x n matrices. 
So we have: 


e The effect of interchanging two rows of a matrix is to multiply the 
determinant by —1: |B| = —|A|. 


Finally, we have the third type of row operation: 
RO3 add a multiple of one row to another. 


Suppose the matrix B is obtained from the matrix A by replacing row 
j of A by row j plus k times row i of A, j # i. For example, consider 
the case in which B is obtained from A by adding 4 times row 1 of A 
to row 2. Then 


ai a2... Alp 
laf. 
Ani A2 ... dnn 
dıl d12 oh din 
Bl = az, + 4a); a22+4aj2_... dn + 40 1n 
dnl dn2 ga dnn 
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In general, in a situation like this, we can expand |B | by row j: 


|B| = (aj, + kaj )Cj + (ajo + kajz)C jo + +++ + (djn + kain)C jn 
= ajCy +4j2Cj2 +: +ajnCjn 
+ k(aiyCj + aj2C jo + +++ + aC jn) 
= |A| +0. 


The last expression in brackets is 0 because it consists of the cofactors 
of one row multiplied by the entries of another row (see Corollary 3.30). 
So this row operation does not change the value of | A]. 

So we see that: 


e There is no change in the value of the determinant if a multiple of 
one row is added to another. 


We collect these results in the following theorem. 


Theorem 3.34 (Effect of a row (column) operation on |4|) All state- 
ments are true if row is replaced by column. 


(ROL) Ifa row is multiplied by a constant a, then 
|A| changes to a|A|. 

(RO2) If two rows are interchanged, then 
|A| changes to —| A|. 

(RO3) Ifa multiple of one row is added to another, then 
there is no change in |A]. 


Example 3.35 We can now use row operations to evaluate 


1 2 -1 4 

-1 3 0 2 

le 2 1 1 2 
1 4 1 3 


by reducing A to an upper triangular matrix. First, we obtain zeros 
below the leading one by adding multiples of row 1 to the rows below. 
The new matrix will have the same determinant as 4. So 


1 2 -1 4 

0 5 -lI 6 
a= 0 —3 3 —6|' 

0 2 2 -I 
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Next, we observe that 


1 2 -l1 4 1 2 -1 4 
E 68 2 cg lO? Ses 6 
0 -3 3 cB Oe -1 2 
0 2 2 <1 02 2 -l 


where we factored —3 from the third row. (We would need to multiply 
the resulting determinant on the right by —3 in order to put the —3 back 
into the third row, and to get back a matrix with the same determinant 
as A.) Next we switch row 2 and row 3, with the effect of changing the 
sign of the determinant. 


12 -1 4 
Os =l 2 
Ale 0 5 -l 6 
0 2 2 -l1 


Next, we use RO3 operations to achieve upper triangular form. These 
operations result in no change in the value of the determinant. So we 
have 


1 2 -l1 4 1 2 -1 4 
0 1 -1 2 0 1 -1 2 

ASB Ae ea A ae 
00 4 —-5 00 0 -=i 


Finally, we evaluate the determinant of the upper triangular matrix, 
obtaining 


(a ee ee 
0 1 -1 2 

I1=3)9 9 4 -4|=30x1x4x(-1) = -12. 
00 0 -i 


A word of caution with row operations on a determinant! What is the 
change in the value of | A| in the following circumstances: 


(1) if Ro is replaced by Rz — 3 R1? 
(2) if Ro is replaced by 3R; — R2? 


For (1), there is no change, but for (2), the determinant will change sign. 
Why? Well, 3R; — R2 is actually two elementary row operations: first, 
we multiply row 2 by —1 and then we add three times row | to it. When 
performing row operation RO3, to leave the determinant unchanged, 
you should always add a multiple of another row to the row you are 
replacing. 
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Activity 3.36 You can shorten the writing in the above example by 
expanding the 4 x 4 determinant using the first column as soon as you 
have obtained the determinant with zeros under the leading one. You 
will then be left with a 3 x 3 determinant to evaluate. Do this. Without 
looking at the example above, work through the calculations in this way 
to evaluate 


1 2 -1 4 
—1 3 0 2 

M= 7 4 a 
l1 4 1 3 


3.3.2 The determinant of a product 


One very important result concerning determinants can be stated as: 
‘the determinant of the product of two square matrices is the product of 
their determinants’. This is the content of the following theorem. 


Theorem 3.37 If A and B aren x n matrices, then 
|AB| = |A||B]. 


Proof: We will outline the proof of this theorem without filling in all 
the details. We first prove the theorem in the case when the matrix A is 
an elementary matrix. We use again the fact established in Section 3.1.1 
(page 92) that the matrix obtained by a row operation on the matrix B 
is equal to the product of the elementary matrix of that row operation 
times the matrix B. 

Let E; be an elementary matrix that multiplies a row by a non-zero 
constant k. Then E; B is the matrix B obtained by performing that row 
operation on B, and by Theorem 3.34, |E1B| = k |B|. For the same 
reason, |E] = |F,/| = k|/| =k. Therefore, 


|E;B| = k|B| = |E] |B]. 


The argument for the other two types of elementary matrices follows 
the same steps. 


Activity 3.38 Try these. Show that if E> is an elementary matrix that 
switches two rows, then |£2B| = |E2| |B|, and do the same for an 
elementary matrix E; that adds a multiple of one row to another. 


So we assume that the theorem is true when A is any elementary matrix. 
Now recall that every matrix is row equivalent to a matrix in reduced 
row echelon form, so if R denotes the reduced row echelon form of the 
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matrix A, then we can write 
A=E,E,_|...E,R, 


where the E; are elementary matrices. Since A is a square matrix, R is 
either the identity matrix or a matrix with a row of zeros. 
Applying the result for an elementary matrix repeatedly 


|A| = |£,£,-1...£,R| = |£,||£--1).--[AaiRI, 


where |R| is either 1 or 0. In fact, since the determinant of an elementary 
matrix must be non-zero, |R| = 0 if and only if |A| = 0. 
If R = 1, then by repeated application of the result for elementary 
matrices, this time with the matrix B, 
|AB| = |(B£,E,_,...£,D)B| 

=|E,E,_,... EB] 

= |E, |E; .-. |E1llB] 

= E,E,—1 : .. Ey ||B| 

= |A| |B|. 


If R Æ J, then 
|AB| = |£,£,-1...£,R B| = |£,||Z,-1]... [Ei ||RBI. 


Since the product matrix RB must also have a row of zeros, |R B| = 0. 
Therefore, |AB| = 0 = 0|B| and the theorem is proved. 


3.4 Matrix inverse using cofactors 
3.4.1 Using determinants to find an inverse 


We start with the following characterisation of invertible matrices. 


Theorem 3.39 If A is ann x n matrix, then A is invertible if and only 
if |A| # 0. 


We will give two proofs of Theorem 3.39. The first follows easily from 
Theorem 3.8. The second is included because it gives us another method 
to calculate the inverse of a matrix. 


First proof of Theorem 3.39 


We have already established this theorem indirectly by our arguments 
in the previous section; we will repeat and collect them here. 
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By Theorem 3.8, A is invertible if and only if the reduced row 
echelon form of A is the identity matrix. Let A be any n x n matrix and 
let R be the reduced row echelon form of A. Then either R is the identity 
matrix (in which case A is invertible) and |R| = 1, or R is a matrix with 
a row of zeros (in which case A is not invertible) and |R| = 0. 

As we have seen, row operations cannot alter the fact that a deter- 
minant is zero or non-zero. By performing a row operation, we might 
be multiplying the determinant by a non-zero constant, or by —1, or 
not changing the determinant at all. Therefore, we can conclude that 
|A| = 0 if and only if the determinant of its reduced row echelon form, 
R, is also 0, and |4| 4 0 if and only if |R| = 1. 

Putting these statements together, | A| 4 0 if and only if the reduced 
row echelon form of A is the identity; that is (by Theorem 3.8), if and 
only if A is invertible. 


Second proof of Theorem 3.39 
We will now prove Theorem 3.39 directly. Since it is an if and only if 
statement, we must prove both implications. 

First we show that if A is invertible, then |4| 4 0. We assume 
A`! exists, so that AAT! = J. Then taking the determinant of both 
sides of this equation, |4A~'| = |Z] = 1. Applying Theorem 3.37 to 
the product, 


|447"| = |A| |A] = 1. 


If the product of two real numbers is non-zero, then neither number can 
be zero, which proves that |A| Æ 0. 
As a consequence of this argument, we have the bonus result that 


We now show the other implication; that is, if |4| Æ 0, then A is invert- 
ible. To do this, we will construct A~!, and to do this we need some 
definitions. 


Definition 3.40 If A is an n x n matrix, the matrix of cofactors of A is 
the matrix whose (i, j) entry is C;;, the (i, j) cofactor of A. The adjoint 
(also sometimes called the adjugate) of the matrix A is the transpose 
of the matrix of cofactors. That is, the adjoint of A, adj(A), is the 
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matrix 
Cii Cr isi Chi 
C C i Gh 
adj(A) = = = - 
Cin Can e. Cnn 


Notice that column 1 of this matrix consists of the cofactors of row 1 of 
A (and row 1 consists of the cofactors of column 1 of 4), and similarly 
for each column and row. 

We now multiply the matrix A with its adjoint matrix: 


Gil an2 sas dn Ci Ca ~... Cm 

, a2) A2 ... An Ci2 Cog ... Cra 
Aadj(A) = : . À : : . 

dn1 dn2 sss Ann Cin Can tee Cnn 


Look carefully at what each entry of the product will be. 

The (1,1) entry is a11C11 + a12C12 +-+: + aınCın. This is the 
cofactor expansion of | A| by row 1. 

The (1,2) entry is a11 C21 + a12C22 + +++ + a1nCzn. This consists of 
the cofactors of row 2 of A multiplied by the entries of row 1, so this is 
equal to 0 by Corollary 3.30. 

Continuing in this way, we see that the entries on the main diagonal 
of the product are all equal to | A], and all entries off the main diagonal 
are equal to 0. That is, 


Al 0: ace 0 
. |A| 
Aadj(A) = = |Al/, 
0 0 JA] 


since |4] is just a real number, a scalar. 
We know |A| Æ 0, so we can divide both sides of the equation by 
|A| to obtain 


A (| adi(4)) =I. 


This implies that A~! exists and is equal to 


fens A 


= Al adj(A). 


This gives not only a proof of Theorem 3.39, but a useful method to 
calculate the inverse of a matrix using cofactors. 
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Example 3.41 Find A~! for the matrix 


1 2 3 
A= & 2 | f 
4 1 1 
The first thing to do is to calculate |A| to see if A is invertible. Using 
the cofactor expansion by row 1, 
|A| = 12 — 1) —2(—-1 — 4) + 3(-1 — 8) = -16 £0. 


We then calculate the minors: for example 


2 1 
Mi, = | iol | = 1, 
and we can fill in the chart below 
M,=1 My = —5 Miz = —9 
Mz; = —-1 My = —11 My = —7 
M3, = —4 M32 = 4 M33 = 4. 


Next, we change the minors into cofactors, by multiplying by —1 those 
minors with i + j equal to an odd number. Finally, we transpose the 
result to form the adjoint matrix, so that 


PIE o 1 1 —4 
A = ged) =n 5 —ll -4]. 
|4] g g g 


As with all calculations, it is easy to make a mistake. Therefore, having 
found A~!, the next thing you should do is check your result by showing 
that 4AT! = I, 


ole 2 i a 2a 

eras (es eee z eat Jd 

l6 \4 1 2g 4 
, (-16 0 0 

are o e a (Oo 
l6\ o o -16 


Activity 3.42 Use this method to find the inverse of the matrix 
1 2 3 
A= (o 4 o) : 
5 6 7 


Remember: The adjoint matrix only contains the cofactors of A; the 
(i, j) entry is the cofactor Cj; of A. A common error is to attempt to 


Check your result. 
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form the adjoint by multiplying cofactors by entries of the matrix. But 
the entries of a matrix A only multiply the cofactors when calculating 
the determinant of A, | A]. 


3.4.2 Cramers rule 


If A is a square matrix with |A| 4 0, then Cramer’s rule gives us an 
alternative method of solving a system of linear equations Ax = b. 


Theorem 3.43 (Cramer’s rule) /f A isn xn, |A| #0, and b € R’, 


then the solution x = (x1, x2, . . . , Xn)" of the linear system Ax = b is 
given by 
| A; 
Xi = ; 
| A| 


where, here, A; is the matrix obtained from A by replacing the ith 
column with the vector b. 


Before we prove this theorem, let’s see how it works. 


Example 3.44 Use Cramer’s rule to find the solution of the linear 
system 


x+2y+3z =7 
—x +2y +z = —3 
4x+y+z =5. 


In matrix form Ax = b, this system is 


A 


We first check that |4| Æ 0. This is the same matrix A as in 
Example 3.41, and we have |A| = —16. Then, applying Cramer’s rule, 
we find x by evaluating the determinant of the matrix obtained from A 
by replacing column 1 with b, and divide this by | A]: 


7 2 3 

-3 2 1 
Le E i a 
E |A| ~ 16 
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and, in the same way, we obtain y and z: 


1 7 3 
-1 -3 1 
4 5 1 48 
y= = = 3, 
| A| —16 
Pedy °F 
-1 2 -3 
Pen ee ee 
| A| —16 


which can easily be checked by substitution into the original equations 
(or by multiplying 4x). 


We now prove Cramer’s rule. 


Proof. Since |A| 4 0, A~! exists, and we can solve for x by multiplying 
Ax = b on the left by 4~!. Then x = A7'b: 


xı Cui Ca ~.. Cr by 
ess xj ae Cig: Cz ... Cr bo 
|A| ; : ia : ; 
Xn Cin Can e. Cnn bn 
The entry x; of the solution is equal to the ith row of this product. 
1 
Xi = —-(b1 Cy + b2Czi + +++ + Bn Cri). 


(A 


Stare at this expression a moment. The cofactors all come from row i 
of the adjoint matrix, and they are the cofactors of column i of A, so 
this looks like a cofactor expansion by column i of a matrix which is 
identical to A except in column i, where the entries are the components 
of the vector b. That is, the term in brackets is the cofactor expansion 
by column i of the matrix A with column i replaced by the vector b; in 
other words, it is | A;|. 


To summarise, for a system Ax = b, where A is square and | A| 4 0, to 
find x; using Cramer’s rule: 


(1) replace column i of A by b, 
(2) evaluate the determinant of the resulting matrix, 
(3) divide by | A]. 


Cramer’s rule is quite an attractive way to solve linear systems of equa- 
tions, but it should be stressed that it has fairly limited applicability. 
It only works for square systems, and only for those square systems 
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where the coefficient matrix is invertible. By contrast, the Gaussian 
elimination method works in all cases: it can be used for non-square 
systems, square systems in which the matrix is not invertible, systems 
with infinitely many solutions; and it can also detect when a system is 
inconsistent. 


Activity 3.45 Can you think of another method to obtain the solution 
to the above example? One way is to use the inverse matrix. Do this. 
We found 47! in Exercise 3.41. Now use it to find the solution x of 


aa 


by calculating x = A~'b. 


3.5 Leontief input-output analysis 


In 1973, Wassily Leontief was awarded the Nobel prize in Economics for 
work he did analysing an economy with many interdependent industries 
using linear algebra. We present a brief outline of his method here. 

Suppose an economy has n interdependent production processes, 
where the outputs of the industries are used to run the industries and 
to satisfy an outside demand. Assume that prices are fixed so that they 
can be used to measure the output. The problem we wish to solve is 
to determine the level of output of each industry which will satisfy 
all demands exactly; that is, both the demands of the other industries 
and the outside demand. The problem can be described as a system of 
linear equations, as we shall see by considering the following simple 
example. 


Example 3.46 Suppose there are two industries: water and electricity. 
Let 


xı = total output of water ($ value) 


x2 = total output of electricity ($ value). 


We can express this as a vector 


(a) 

x = 

X2 

called a production vector. Suppose we know that the production 
of water requires both water and electricity as inputs, and that the 
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production of electricity requires both water and electricity as inputs. 
Specifically, suppose the following is known: 


water production needs $0.01 water \ 


$0.15 electricity 
to produce $1.00 water 
$0.21 water } 


$0.05 electricity 
to produce $1.00 electricity. 


electricity production needs 


What is the total water used by the industries? Water is using $0.01 for 
each unit output, so a total of 0.01x,; and electricity is using $ 0.21 water 
for each unit of its output, so a total of 0.21x2. The total amount of water 
used by the industries is therefore 0.01x, + 0.21x2. In the same way, 
the total amount of electricity used by the industries is 0.15x, + 0.05x2. 
The totals can be expressed as 


( water ) = Eo 0.21 ) o ) See 

electricity/ ~— \0.15 0.05/ \x./ 7 7 

The matrix C is known as a consumption matrix or a technology matrix. 
After the industries have used water and electricity to produce their 


outputs, how much water and electricity are left to satisfy the outside 
demand? 


Activity 3.47 Think about this before continuing. Write down an 
expression for the total amount of water which is left after the industries 
have each used what they need to produce their output. Do the same for 
electricity. 


Let dı denote the outside demand for water, and d the demand for 
electricity. Then in order for the output of these industries to supply the 
industries and also to satisfy the outside demand exactly, the following 
equations must be satisfied: 


xı —0.01x; —0.21x) = dı (water) 
x2 — 0.15x; — 0.05x2 = do (electricity). 


In matrix notation, 
a 7 Cie ee ea _ e 
X2 0.15 0.05 X2 = d i 


or, x — Cx = d, where 
_ (4 
a=(%) 
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is the outside demand vector. If we use the the fact that Ix = x, where 
I is the 2 x 2 identity matrix, then we can rewrite this system in matrix 
form as 


Ix-—Cx=d, or U -—C)x=d. 


This is now in the usual matrix form for a system of linear equations. A 
solution, x, to this system of equations will determine the output levels 
of each industry required to satisfy all demands exactly. 


Now let’s look at the general case. Suppose we have an economy with n 
interdependent industries. If c;; denotes the amount of industry i used 
by industry 7 to produce $1.00 of industry 7, then the consumption or 
technology matrix is C = (cij): 


Cll C12 t+ Cin 
C21 C22 + Cn 

C = . . . 3 
Cni Cn2 *** Cnn 


where: 


e row i lists the amounts of industry 7 used by each industry 
e column j lists the amounts of each industry used by industry j. 


If, as before, we denote by d the n x 1 outside demand vector, then 
in matrix form the problem we wish to solve is to find the production 
vector x such that 


(I —C)x=d, 
a system ofn linear equations in n unknowns. 


Activity 3.48 Return to the example given above and assume that the 
public demand for water is $627 and for electricity is $4,955. Find the 
levels of output which satisfy all demands exactly. 


3.6 Learning outcomes 


You should now be able to: 


e say what is meant by an elementary matrix, and understand how 
they are used for row operations 

e find the inverse of a matrix using row operations 

e find the determinant of a square matrix and use it to determine if a 
matrix is invertible 
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e find the inverse of a matrix using cofactors 

e solve a system of linear equations using Cramer’s rule 

e say what is meant by the Leontief input-output model and solve 
input—output analysis problems. 


In addition, you should know that: 


e There are three methods to solve Ax = b if A isn x n and |A| Æ 0: 
(1) Gaussian elimination 
(2) find A~!, then calculate x = A7~'b 
(3) Cramer’s rule. 
e There is one method to solve Ax = b if A ism x n and m Æ n, or 
if |A| = 0: 
(1) Gaussian elimination. 
e There are two methods to find A7!: 
(1) using cofactors for the adjoint matrix 
(2) by row reduction of (A | T) to (J | A7!). 
- IfAisann x n matrix, then the following statements are equivalent 
(Theorems 3.8 and 3.39): 
(1) A is invertible. 
(2) Ax = b has a unique solution for any b € R”. 
(3) Ax =0 has only the trivial solution, x = 0. 
(4) the reduced row echelon form of A is T. 


(5) |4| #0. 


3.7 Comments on activities 


Activity 3.2 Only the last matrix is an elementary matrix, represent- 
ing the operation R3 — R; on J. The others each represent two row 
operations. For example, 


2 1 0 1 1 0 2 0 0 
(o 1 o) = (o 1 o) (o 1 o) = EÉ), 
0 0 1 0 0 1 0 0 1 


where E; represents 2R; and Ez represents R; + R2. You should mul- 
tiply the matrices in the opposite order, FE; E>, and notice the effect, 
thinking about the row operations on 7. 


Activity 3.4 The matrix £E is the identity matrix after the row operation 
R — 4R; has been performed on it, so the inverse matrix is the identity 
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1 0 0 
mofa 1 o). 
0 0 1 


Multiply out E ET! and ET! E as instructed. 


matrix after Ra + 4R;, 


Activity 3.11 For the matrix 4, 


Ot ziot 1 2 0,001 
ain=(0 -i iforo)se( o i aloa o) 
1 01001 2 i oi. 00 
1 2 0,0 0 1 12 0,0 0 1 
en fo al 1o 1 0) = (o 1 -1 [0 —1 o) 
5 ali p2 05 3/1 0 2 
ne ( 2 "2 0 (oi 00 0 ) 
o i Ailo -Tolo | KL 6 
00 8!1 5 2 00 1/137 2 5 
e (o 2 i o p) Te -4 5, 3 
slot oly 34 orol r d 
TEU E E COL a @ a 
So, 


Now check that AAT! = I. 

When you carry out the row reduction, it is not necessary to always 
indicate the separation of the two matrices by a line as we have done so 
far. You just need to keep track of what you are doing. 

In the calculation for the inverse of B, we have omitted the line but 
added a bit of space to make it easier for you to read. 


21 3 100), /1 2 0 001 
ain= (0 = 1 oao) =r (0 —1 1 o 10) 
100 


1 2 0 00 I 2 1 3 


afi 2 9 00 1), 71 2 0 00 1 
nfo <f 1 01 o J= (o 1 -1 0 -1 | 
0 -33 10 2 6234 1i 0 


w(t 2 9 0 0 1 
Mfo i a 0 —1 0). 
0 0 


0 1 <3 -2 
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which indicates that the matrix B is not invertible; it is not row equivalent 
to the identity matrix. 


Activity 3.16 C)3 = 13. 


Activity 3.18 |M| = —1(8 — 3) — 2(0 — 3) + 1(0 —2) = 1. 


Activity 3.21 You should either expand by column 1 or row 2. For 
example, using column 1: |M| = —1(8 — 3) + 1(6—2)=-—l. 


Activity 3.36 
os 1 6|_|5 7 8 
|A| = =|=3 3 Bly 
0 -3 3 —6 2 > -] 
0 2 2 -l 


At this stage you can expand the 3 x 3 matrix using a cofactor expan- 
sion, or continue a bit more with row operations: 


1=1 2 1-1 2 a 
ai=3 <1 6 |=Slo 4 <4 =a) 75 |= 3-4) = -12. 
22 -1 0 4 -5 
Activity 3.42 
|A| = —32 £0 
; pg 4 -2\ ,/-7-1 3 
w= gio- 0 r o J=a(0 2 o): 
|4] 32 -20 4 4 ae ee 


Activity 3.47 The total water output remaining is xı —0.01x; — 
0.21x2, and the total electricity output left is x2 — 0.15x; — 0.05x2. 


Activity 3.48 Solve (J — C)x = d by Gaussian elimination, where 


= (0.01 0.21 x _ ( 627 
ca ee | eee. 
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Reducing the augmented matrix, 


0.99 —0.21 627 33 —7 20900 
(UGE a 0.95 A ü t 19 ae 
1 —7/33 1900/3 1 -7/33 1900/3 
E 19 A E 202/11 T1000) 
e =7/33 ple E G 0 o 
0 1 5500 0 1 5500)" 


A x= (1800) 
= (5500 J’ 


3.8 Exercises 


Exercise 3.1 Use elementary row operations to find any inverses of the 
following matrices. 


1 2 -l -1 2 1 
a=(0 1 2), s=% 1 a). 
3 8 1 3 1 4 


1 
Let b = | 1 } Find all solutions to 4x = b. Find all solutions to 
5 
Bx =b. 
Is there a vector d € R? for which Ax = d is inconsistent? Is there 
a vector d € R? for which Bx = d is inconsistent? In each case, justify 
your answer and find such a vector d if one exists. 


Exercise 3.2 Use elementary row operations to reduce the matrix 


1 0 2 
a=(0 1 -1] 
1 4 -l 


to the identity matrix. Hence, write A as a product of elementary 
matrices. 

Use this to evaluate |4| as a product of matrices, then check your 
answer by evaluating | A| using a cofactor expansion. 


Exercise 3.3 Evaluate each of the following determinants using a 
cofactor expansion along an appropriate row or column. 


Te ee Te. 
(a) oe : : 1 4 0 3 
01 0 1 
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Exercise 3.4 Suppose w € R and B is the matrix 


2 1 w 
s=; 4 =), 
1 —2 7 


For what values of w is the determinant of B equal to 0? 


Exercise 3.5 Evaluate the following determinant using row operations 
to simplify the calculation: 


S 24 2 
-3 1 5 1 
—4 3 1I 3 
2 1 -1 1 


Check your answer by evaluating it a second time using column 
operations. 


Exercise 3.6 For which values of À is the matrix 
7—À —15 
a ( 2 Ae a 


Exercise 3.7 Suppose 4 is a 3 x 3 matrix with |4| =7. Find |24], 
A AAEN 


not invertible? 


Exercise 3.8 Use the method of the adjoint matrix to find the inverse 
of each of the following matrices, if it exists. 


-1 2 1 5 2 -l 
BS 0 1 2) ; C= 13 4 : 
3 1 4 6 5 3 

Exercise 3.9 Write out the system of equations Bx = b, where 

-1 2 1 x 1 
s=(0 2); = (>|. v= (1). 

3 1 4 z 5 

Find the solution using Cramer’s rule. 


Exercise 3.10 Consider an economy with three industries, 
i; : water in: electricity iz: gas 
interlinked so that the corresponding consumption matrix is 
0.2 0.3 0.2 
C= (04 0.1 02] : 


0 0 0.1 
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Each week the external demands for water, electricity and gas are, 
respectively, 


dı = $40, 000 dy = $100, 000. d; = $72, 000. 


(a) How much water, electricity and gas is needed to produce $1 worth 
of electricity? 

(b) What should be the weekly production of each industry in order 
to satisfy all demands exactly? 


Exercise 3.11 The vector product or cross product of two vectors is 
defined in R? as follows. If 


ay by 1 0 0 
a3 b3 0 0 1 


then a x b is the vector given by 


€e C2 €3 
a x b= a, a2 a3 
bı bz b3 

= (ab; — a3b2)e, — (aıb3 — a3bı)e2 + (ai b2 — a2b1)e3. 


(That determinant might look odd to you since it has vectors as some of 
its entries: but really this is just an extension of the earlier notation, and 
you can take the second equation as the definition of what it means.) 
The vector a x b is perpendicular to both a and b (see part (b)). 


3 4 
Check that w is perpendicular to both u and v. 


(b) Show that for general vectors a, b, c € R? the scalar triple product, 
(a, b x c) is given by the following determinant: 


1 2 
(a) Calculate w = u x v forthe vectors u = 2 and v = (> ) : 


Qa; a a3 
(a,bxc)= |b; b b3 (*) 
C1 C2 C3 


Use this and properties of the determinant to show that the vector 
b x ¢ is perpendicular to both b and c. 
(c) Show that the vectors a, b, c are coplanar (lie in the same plane) 
if and only if the determinant (*) is equal to 0. 
Find the constant ¢ if the vectors 


A 
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3.9 Problems 


Problem 3.1 Use elementary row operations to find inverses of each 
of the following matrices when the matrix has an inverse. 


1 2 3 [i 2 7 : S 

A=|2 3 0 B=ù|/2 3 0 C= 
0 1 2 0 1 6 Ae eae 
00 1 0 


Is C an elementary matrix? If the answer is ‘yes’, what operation does 
it perform? If the answer is ‘no’, write it as a product of elementary 
matrices. 


Problem 3.2 Given a system of equations Ax = b for several different 
values of b, it is often more practical to find A~!, if it exists, and then 
to find the solutions using x = A~'b. 

Use this method to solve Ax = b, for the matrix 


Ly 22s 
a=(2 3 o). 
0 1 2 


and for each of the following vectors b,,7 = 1, 2, 3: 


1 1 0 
(a) »= (0); (b) m= (1). (c) n= (1); 
3 1 0 


Be certain your solution for A7! is correct before carrying out this 
problem by checking that AAT! = T. 


Problem 3.3 Evaluate the following determinants using the cofactor 
expansion along an appropriate row or column. 


2 51 
GII Os oh 
a ee 
7 5 2 3 
2 0 0 0 
®© i 2 0 0 
23 57 1 -1 
1210 
32 10 
(OE gs a gah 
0 111 
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0 1 0 0 
1 0 0 0 
(d) 0 0 1 OF 
0 0 0 1 
000 0 0 1 
0 0 0 0 32 
(c) 0002 9 3 
00107 4 
069875 
1 34296 


Problem 3.4 Let 


3 ¢t -2 
B= | -1 5 3 ) : 
2 1 1 
For what values of ¢ is the determinant of B equal to zero? 


Problem 3.5 Evaluate the following determinants (use row operations 
to simplify the calculation). 


| | 
~a A 
Dn w 


(a) 


NOR NO 
l 

= N 

© 

— 

D 


(b) 


NN 

| 

oa 
CADWW PROUD 


NNOO 


| 
= 


3a? 
2b? 


c2 


Ww 
STS ocoworna 
WN 


(c) 


Re N U 
N 


Q 


Problem 3.6 Consider the matrix 


II 
—~ 
N 
| 
=> 
(0S) 
NS 
=> 
M 
A 


For which values of à will the matrix equation Ax = 0 have non-trivial 
solutions? 
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Problem 3.7 Use the method of the adjoint matrix to find the inverse 
of each of the following matrices, if it exists. 


2 0 -3 1 0 2 1 2 0O 
a=(0 3 i) s=(2 1 5) c=(0 i 1). 
-1 4 2 0 -1 1 2 1 -1 


Problem 3.8 Express the following system of equations in matrix form, 
as Ax = b. 


2x —y+5z=2 
x+y-—2z=1 
=3x —2y t2 = =), 
Solve the system using each of the three matrix methods. Solve it 


by Gaussian elimination, by using A~!, and by using Cramer’s rule. 
Express your solution in vector form. 


Problem 3.9 Use Cramer’s rule to find the value of x, y,z for system 
(a) and to find the value of z for system (b) where a, b are constants, 
a Æ +b, a 20. 


x+y+z=$ 
(a) 2x +y-z=3 
—x +2y +z =3. 
ax — ay + bz =a +b 
(b) bx — by +az = 0 


—ax +2by + 3z =a — b. 


Problem 3.10 Prove the following statement using either determinants 
or Theorem 3.12. 
If A and B aren x n matrices and (ABY! exists, 
then A and B are invertible. 


www. TechnicalBooksPDF.com 


4 


Rank, range and linear 
equations 


In this short chapter, we aim to extend and consolidate what we have 
learned so far about systems of equations and matrices, and tie together 
many of the results of the previous chapters. We will intersperse an 
overview of the previous two chapters with two new concepts, the rank 
of a matrix and the range of a matrix. 

This chapter will serve as a synthesis of what we have learned so 
far, in anticipation of a return to these topics later. 


4.1 The rank of a matrix 
4.1.1 The definition of rank 


Any matrix A can be reduced to a matrix in reduced row echelon form 
by elementary row operations. You just have to follow the algorithm and 
you will obtain first a row-equivalent matrix which is in row echelon 
form, and then, continuing with the algorithm, a row-equivalent matrix 
in reduced row echelon form (see Section 3.1.2). Another way to say 
this is: 


e Any matrix A is row-equivalent to a matrix in reduced row echelon 
form. 


There are several ways of defining the rank of a matrix, and we shall 
meet some other (more sophisticated) ways later. All are equivalent. We 
begin with the following definition: 
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Definition 4.1 (Rank of a matrix) The rank, rank(A), of a matrix A is 
the number of non-zero rows in a row echelon matrix obtained from A 
by elementary row operations. 


Notice that the definition only requires that the matrix A be put into 
row echelon form, because by then the number of non-zero rows is 
determined. By a non-zero row, we simply mean one that contains 
entries other than 0. Since every non-zero row of a matrix in row echelon 
form begins with a leading one, this is equivalent to the following 
definition. 


Definition 4.2 The rank, rank(A), of a matrix A is the number of 
leading ones in a row echelon matrix obtained from A by elementary 
row operations. 


Generally, if A is an m x n matrix, then the number of non-zero rows 
(the number of leading ones) in a row echelon form of A can certainly 
be no more than the total number of rows, m. Furthermore, since the 
leading ones must be in different columns, the number of leading ones 
in the echelon form can be no more than the total number, n, of columns. 
Thus, we have: 


e For an m xn matrix A, rank(A) < min{m,n}, where min{m, n} 
denotes the smaller of the two integers m and n. 


Example 4.3 Consider the matrix 


1 2 1 1 
M= (2 3 0 s) : 
3 5 1 6 
Reducing this using elementary row operations, we have: 
1 2 1 1 t- 2 1 1 t 2 t 1 
(25 o 5) (o -1 -2 s] (01 2 a). 
3 5 1 6 0 -1 -2 3 000 0 
This last matrix is in row echelon form and has two non-zero rows (and 


two leading ones), so the matrix M has rank 2. 


Activity 4.4 Show that the matrix 


12 1 1 
a=(2 3 0 s) 
3 5 1 4 


has rank 3. 
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4.1.2 The main theorem again 


If a square matrix A of size n x n has rank n, then its reduced row 
echelon form has a leading one in every row and (since the leading 
ones are in different columns) a leading one in every column. Since 
every column with a leading one has zeros elsewhere, it follows that 
the reduced echelon form of A must be /, the n x n identity matrix. 
Conversely, if the reduced row echelon form of A is /, then, by the 
definition of rank, 4 has rank n. The main theoretical result of Chapter 3 
is a characterisation of invertible matrices. We can now add to the 
main theorem, Theorem 3.8, and to Theorem 3.39, one more equivalent 
statement characterising invertibility. This leads to the following result: 


Theorem 4.5 Jf A is ann x n matrix, then the following statements 
are equivalent. 


e A`! exists. 

© Ax = b has a unique solution for any b € R”. 
© Ax = Q has only the trivial solution, x = 0. 

e The reduced echelon form of A is I. 

e |A| 40. 

e The rank of A is n. 


4.2 Rank and systems of linear equations 
4.2.1 General solution and rank 


Recall that to solve a system of linear equations using Guassian elim- 
ination, we form the augmented matrix and reduce it to echelon form 
by using elementary row operations. We will look at some examples to 
review what we learned in Chapter 2, and link this to the concept of the 
rank of a matrix. 


Example 4.6 Consider the system of equations 


xı +2x2 +23 = 1 
2x, +3x. =5 
3x, + 5x2 + x3 = 4. 


The augmented matrix is the same as the matrix B in the previous 
activity. When you reduced B to find the rank, after two steps you 
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found 


12 1 1 1 2 1 1 12 1 1 
(2 3 0 s) —> (o -1 -2 4 > (o | ee 2) 
3 5 1 4 0 -1 -2 1 0 0 0 -2 


Thus, the original system of equations is equivalent to the system 


xı + 2x2 + x3 = 1 
X2 + 2x3 = —3 
Ox; + Ox. + 0x3 = —2. 


This system has no solutions, since there are no values of x1, x2, x3 that 
satisfy the last equation, which reduces to the false statement ‘0 = —2’ 
whatever values we give the unknowns. We deduce, therefore, that the 
original system has no solutions and we say that it is inconsistent. In 
this case, there is no reason to reduce the matrix further. 

Continuing with our example, the coefficient matrix, A, consists of 
the first three columns of the augmented matrix, and the row echelon 
form of A consists of the first three columns of the row echelon form 
of the augmented matrix: 


1 2 1 1 2 1 
a=(2 3 o) => (0 1 | 
3 5 1 0 0 0 


1211 12 1 1 
um= (2 3 0 Jono (a 1 2 3). 
3 5 14 000 1 


The rank of the coefficient matrix A is 2, but the rank of the augmented 
matrix (A |b) is 3. 


If, as in Example 4.6, the row reduction of an augmented matrix has a 
row of the kind (00 ... 0 a), with a Æ 0, then the original system is 
equivalent to one in which there is an equation 


Ox; + 0x2 +---+0x, =a (a #0), 


which clearly cannot be satisfied by any values of the x;s, and the system 
is inconsistent. Then the row echelon form of the augmented matrix will 
have a row ofthe form (0 0 ... 0 1), and there will be one more leading 
one than in the row echelon form of the coefficient matrix. Therefore, 
the rank of the augmented matrix will be greater than the rank of the 
coefficient matrix. If the system is consistent, there will be no leading 
one in the last column of the augmented matrix and the ranks will be 
the same. In other words, we have the following result: 
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e A system Ax = b is consistent if and only if the rank of the aug- 
mented matrix is precisely the same as the rank of the matrix A. 


Example 4.7 In contrast, consider the system of equations 


Mot 2x7 Fy = 1 
2x, +3x. =5 
3x1 + 5x. + x3 = 6. 


This system has the same coefficient matrix A as in Example 4.6, and 
the rank of A is 2. The augmented matrix for the system is the matrix 
M in Example 4.3, which also has rank 2, so this system is consistent. 


Activity 4.8 Write down a general solution for this system. Note that 
since the rank is 2 and there are three columns in A, there is a free 
variable and therefore there are infinitely many solutions. 


Now suppose we have an m x n matrix A which has rank m. Then there 
will be a leading one in every row of an echelon form of A, and in this 
case a system of equations Ax = b will never be inconsistent. Why? 
There are two ways to see this. In the first place, if there is a leading one 
in every row of A, the augmented matrix (A|b) can never have a row of 
the form (0 0 ... 0 1). Second, the augmented matrix also has m rows 
(since its size is m x (n + 1)), so the rank of (A|b) can never be more 
than m. So we have the following observation: 


e Ifanm xn matrix A has rank m the system of linear equations, 
Ax = b will be consistent for all b € R”. 


Example 4.9 Suppose that 


12 1 1 
a=(2 3 0 s) 
3 5 1 4 


is the coefficient matrix of a system of three equations in four unknowns, 
Bx = d, with d € R’. This matrix B is the same as that of Activity 4.4 
and we determined its row echelon form in Example 4.6 (where it was, 
differently, interpreted as an augmented matrix of a system of three 
equations in three unknowns): 


12 1 1 12 1 1 
a=(2 3 0 Jono (a 1 2 =). 
3 5 1 4 0 0 0 1 


The matrix B is 3 x 4 and has rank 3, so as we argued above, the system 
of equations Bx = d is always consistent. 
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Now let’s look at the solutions. Any augmented matrix (B |d) will be row 
equivalent to a matrix in echelon form for which the first four columns 
are the same as the echelon form of B; that is, 


121 1 p 
Ga (0 1 2 -3 p) 


000 1 p 


for some constants p;, which could be zero. This system will have 
infinitely many solutions for any d € R?, because the number of 
columns is greater than the rank of B. There is one column without 
a leading one, so there is one non-leading variable. 


Activity 4.10 If pı = 1, p = —2 and p; = 0, and 
xX = (x1, X2, X3, X4)", 


write down the solution to the given system Bx = d in vector form, and 
use it to determine the original vector d. 


If we have a consistent system such that the rank r is strictly less than 
n, the number of unknowns, then as illustrated in Example 4.9, the 
system in reduced row echelon form (and hence the original one) does 
not provide enough information to specify the values of x1, X2, ..., Xn 
uniquely and we will have infinitely many solutions. Let’s consider this 
in more detail. 


Example 4.11 Suppose we are given a system for which the augmented 
matrix reduces to the row echelon form 


1 3 -2 0 0 0 
00 1 23 1 
00 0 0 1 5 
00 0 0 0 0 


Here the rank (number of non-zero rows) isr = 3, which is strictly less 
than the number of unknowns, n = 5. 
Continuing to reduced row echelon form, we obtain the matrix 


13 0 4 0 —28 
0 0 1 2 0 —-14 
000 01 5 
000 0 0 0 


Activity 4.12 Verify this. What are the additional two row operations 
which need to be carried out? 
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The corresponding system is 


x, + 3x2 + 4x4 = —28 
x3 + 2x4 = —14 
X5 =, 


The variables x1, x3 and x5 correspond to the columns with the leading 
ones and are the leading variables. The other variables are the non- 
leading variables. 

The form of these equations tells us that we can assign any values to 
x and x4, and then the leading variables will be determined. Explicitly, 
if we give x and x4 the arbitrary values s and t, where s, t represent 
any real numbers, the solution is given by 


xı = —28 — 3s — 4t, m=s, x3 =-14—-2t, x4=t, x5=5. 


There are infinitely many solutions because the so-called ‘free variables’ 
x2, x4 can take any values s, t € R. 


Generally, we can describe what happens when the row echelon form 
has r < n non-zero rows (00 ... Ol * x... x). If the leading one 
is in the kth column, it is the coefficient of the variable x. So if 
the rank is 7 and the leading ones occur in columns c1, C2, ..., ¢;, then 
the general solution to the system can be expressed in a form where the 
unknowns X¢,, Xc,,---,X¢, (the leading variables) are given in terms of 
the other n — r unknowns (the non-leading variables), and those n — r 
unknowns are free to take any values. In Example 4.11, we have n = 5 
andr = 3, and the three variables x1, x3, x5 can be expressed in terms 
of the 5 — 3 = 2 free variables x2, x4. 

If r =n, where the number of leading ones r in the echelon form 
is equal to the number of unknowns n, there is a leading one in every 
column since the leading ones move to the right as we go down the 
rows. In this case, a unique solution is obtained from the reduced 
echelon form. In fact, this can be thought of as a special case of the 
more general one discussed above: since r = n, there are n —r = 0 
free variables, and the solution is therefore unique. 

We can now summarise our conclusions thus far concerning a gen- 
eral linear system of m equations in n variables, written as Ax = b, 
where the coefficient matrix A is an m x n matrix of rank r: 


e Ifthe echelon form of the augmented matrix has arow (0 0 ... 01), 
the original system is inconsistent; it has no solutions. In this case, 
rank(A) =r < m and rank(A|b) =r + 1. 
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« Ifthe echelon form of the augmented matrix has no rows of the above 
type, the system is consistent, and the general solution involves n — r 
free variables, where r is the rank of the coefficient matrix. When 
r <n, there are infinitely many solutions, but when r = n there are 
no free variables and so there is a unique solution. 


A homogeneous system of m equations in n unknowns is always con- 
sistent. In this case, the last statement still applies. 


e The general solution of a homogeneous system involves n — r free 
variables, where r is the rank of the coefficient matrix. Whenr < n 
there are infinitely many solutions, but when r = n there are no 
free variables and so there is a unique solution, namely the trivial 
solution, x = 0. 


4.2.2 General solution in vector notation 


Continuing with Example 4.11, we found the general solution of the 
linear system in terms of the two free variables, or parameters, s and t. 
Expressing the solution, x, as a column vector, we have 


x] —28 — 3s — 4t —28 —3s —4t 
X2 KY 0 KY 0 

x=|x |= —14 -2t =| —14 |+| 0 |+| -2t 
X4 t 0 t 
x5 5 5 0 


That is, the general solution is 


x=ptsvyjt+tv. s,téER, 


where 
—28 —3 —4 
1 0 
p=|-4], w=| 0], v=] -2 
0 1 


Nn 
© 
© 


Applying the same method, more generally, to a consistent system of 
rank r with n unknowns, we can express the general solution of a 
consistent system Ax = b in the form 


X = p + avy + 42V2 + +++ F an—rVn-r. 


4.3 Range 139 


Note that if we put all the a;s equal to 0, we get a solution x = p, which 
means that Ap = b, so p is a particular solution of the system. Putting 
a, = | and the remaining a;s equal to 0, we get a solution x = p+ vq, 
which means that A(p + vı) = b. Thus, 


b = A(p+ vı) = Ap + Av; = b+ Avy. 


Comparing the first and last expressions, we see that Av, = 0. Clearly, 
the same equation holds for v2,...,V,—-,. So we have proved the 
following: 


e« IfAisanm x n matrix of rank r, the general solution of Ax = b is 
the sum of: 
e aparticular solution p of the system Ax = b and 
e alinear combination aV + a2V2 +--+ +4,_,;V,_, Of solutions 
V1, V2,---, Vn_r Of the homogeneous system Ax = 0. 
e If A has rank n, then Ax = 0 only has the solution x = 0, and so 
Ax = b has a unique solution: p+ 0 = p. 


This is a more precise form of the result of Theorem 2.29, which 
states that all solutions of a consistent system Ax = b are of the form 
x = p + z where p is any solution of Ax = b and z € N(A), the null 
space of A (the set of all solutions of Ax = 0). 


4.3 Range 


The range of a matrix A is defined as follows: 


Definition 4.13 (Range of a matrix) Suppose that 4 is an m xn 
matrix. Then the range of A, denoted by R(A), is the subset of R” 
given by 


R(A) = {Ax | x € R”}. 


That is, the range is the set of all vectors y € R” of the form y = Ax 
for some x € R”. 


What is the connection between the range of a matrix A and a system 
of linear equations Ax = b? If A ism x n, then x € R” and b € R”. If 
the system Ax = b is consistent, then this means that there is a vector 
x € R” such that Ax = b, so b is in the range of A. Conversely, if b 
is in the range of A, then the system Ax = b must have a solution. 
Therefore, for an m x n matrix A: 
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e the range of A, R(A), consists of all vectors b € R” for which the 
system of equations Ax = b is consistent. 


Let’s look at R(A) from a different point of view. Suppose that the 
columns of A are ¢), €2, ..., Can, which we can indicate by writing 
A=(e,@... ¢,). If x =(a},a,...,a,)' € R”, then we saw in 
Chapter | (Theorem 1.38) that the product Ax can be expressed as 
a linear combination of the columns of A, namely 


AX = jC] + A202 +--+ + ayy. 


Activity 4.14 This is a good time to convince yourself (again) of this 
statement. Write out each side using €; = (C1;, C2;,.--, Cmi)! to show 
that 


AX = jC] + Q202 + +++ + ayy. 


Try to do this yourself before looking at the solution to this activity. 
This is a very important result which will be used many times in this 
text, so make sure you understand how it works. 


So, R(A), the set of all matrix products Ax, is also the set of all 
linear combinations of the columns of A. For this reason, R(A) is 
also called the column space of A. (We’ll discuss this more in the next 
chapter.) 

If A = (ci C2 ... €n), where c; denotes column i of A, then we can 
write 


R(A) = {aic + ane. +... + ny | 41,42,...,An E R}. 


Example 4.15 Suppose that 


1 2 
s=- l 
2 1 
Then for x = (a, a2)', 
1 2 . a, + 2a, 1 2 
a= (= 3) (21) = (“even ) =a (<1 J +e (3). 
2 1 ? 2a + a 2 l 


so 
a, + 2a 
R(A) = —a, + 3a, 1,02 E e}. 
2a; + a 
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or 


R(A) = {æicı + are | &1, a2 E R}, 


1 2 
where cı = (<1) and c = (3) are the columns of A. 
2 1 


Again, thinking of the connection with the system of equations 
Ax = b, we have already shown that Ax = b is consistent if and only 
if b is in the range of A, and we have now shown that R(A) is equal to 
the set of all linear combinations of the columns of A. Therefore, we 
can now assert that: 


e The system of equations Ax = b is consistent if and only if b is a 
linear combination of the columns of A. 


Example 4.16 Consider the following systems of three equations in 
two unknowns. 


x+2y=0 x+2y=1 
=x +3y =—5 —x+3y=5 
2x+y=3 2x +y=2. 


Solving these by Gaussian elimination (or any other method), you will 
find that the first system is consistent and the second system has no 
solution. The first system has the unique solution (x, y)! = (2, —1)!. 


Activity 4.17 Do this. Solve each of the above systems. 


The coefficient matrix of each of the systems is the same, and is equal 
to the matrix A in Example 4.15. For the first system, 


=(= i). ae »=(-5) 


Checking the solution, you will find that 


mE )00-(3 


or 
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On the other hand, it is not possible to express the vector (1, 5, 2)' asa 
linear combination of the column vectors of A. Trying to do so would 
lead to precisely the same set of inconsistent equations. 

Notice, also, that the homogeneous system Ax = 0 has only the 
trivial solution, and that the only way to express 0 as a linear combination 
of the columns of A is by 0c; + 0c = 0. 


Activity 4.18 Verify all of the above statements. 


4.4 Learning outcomes 


You should now be able to: 


e explain what is meant by the rank of a matrix 

e find the rank of an m x n matrix A 

e explain why a system of linear equations, Ax = b, where A is an 
m x n matrix, is consistent if and only if the rank( A) = rank((A|b)); 
and why if rank(4A) = m, then Ax = b is consistent for all b € R” 

e explain why a general solution x to Ax = b, where A is an m x n 
matrix of rank r, is of the form 


x = p +aiVı + 42V2 +°:-+ ay rVn-r, Gi E R; 


specifically why there are n — r arbitrary constants 

e explain what is meant by the range of a matrix 

e show that if A = (c1 2 ... €n), andifx = (a, a2,...,a,)' € R”, 
then Ax = ajc) +Q7€C) +---+a,¢, 

e write b as a linear combination of the columns of A if Ax = b is 
consistent 

e write 0 as a linear combination of the columns of A, and explain 
when it is possible to do this in some way other than using the trivial 
solution, x = 0, with all the coefficients in the linear combination 
equal to 0. 


4.5 Comments on activities 


Activity 4.8 One more row operation on the row echelon form will 
obtain a matrix in reduced row echelon form which is row equivalent 
to the matrix M, from which the solution is found to be 


7 3 
r= (2) +:(-2), te 
0 1 


i 
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Activity 4.10 Substitute for pı, p2, p3 in the row echelon form of the 
augmented matrix and then continue to reduce it to reduced row echelon 
form. The non-leading variable is x3. Letting x3 = t, the general solution 
is 


X1 5 3 

= X2 = —2 —2 = 

x= =| 6 +t] , | =pttv, te R. 
X4 0 


Since Bp = d, multiplying Bp, you will find that d = (1, 4, 5)". (You 
can check all this by row reducing (B|d).) 


Activity 4.14 First write out the matrix product of A = (c;;) and x. 


C11 C12 ESG Cin QQ} 

C21 C22 C2n a2 
Ax = : 

Cmi Cm2 +++ Cmn Qn 


The product is m x 1; that is, 
C111 + C122 +. + Cinn 


A C211 + C7202 + +++ + Conn 
x= . 


Cm] + Cm202 F +++ + Cmn@n 


and can be written as a sum of n, m x 1 vectors: 


C111 C1202 CinQn 
C2101 C2202 C2nQn 
Ax = ‘ + | EREN i 
Cm1®1 Cm2®2 CmnQn 
So, 
Cll C12 Cin 
C21 C22 C2n 
AX = Q + a2 ; + + Ay 
Cm1 Cm2 Cmn 
That is, 


AX = aC; + Q22 +---+a,¢,. 
All these steps are reversible, so any expression 


Cy + 202 +--+ H AÆnCn 
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can be written in matrix form as Ax, where A = (cij) and x = 
T 
(1, Q2,...,Qn)- 


4.6 Exercises 


Exercise 4.1 Solve the following system of equations Ax = b by reduc- 
ing the augmented matrix to reduced row echelon form: 


x, + 5x2 + 3x3 + 7x4 + x5 = 2 
2x1 + 10x2 + 3x3 + 8x4 + 5x5 = —5 
xi + 5x2 + x3 + 3x4 + 3x5 = —4. 


If r = rank(A) and n is the number of columns of A, show that your 
solution can be written in the form x = p + a1Vı + . . . + an—rVn-r 
where a; € R. 
Show also that Ap = b and that Av; = 0 fori = 1,...,n =r. 
Express the vector b as a linear combination of the columns of the 
coefficient matrix A. Do the same for the vector 0. 


Exercise 4.2 Find the rank of the matrix 


1 0O 1 02 
2 1 Tr r3 
Am TBS ak yee 
0 3° S220 


Determine N(A), the null space of A, and R(A), the range of A. 


Exercise 4.3 Consider the system of linear equations Ax = b given 
below, where à and u are constants, and 


1 2 O x 2 
=s 1 i). = (>). v= (7); 
1 -1 1 zZ u 


Compute the determinant of A, |A]. 
Determine for which values of à and u this system has: 


(a) a unique solution, 
(b) no solutions, 
(c) infinitely many solutions. 


In case (a), use Cramer’s rule to find the value of z in terms of à and 
h. In case (c), solve the system using row operations and express the 
solution in vector form, x = p + ftv. 
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Exercise 4.4 A system of linear equations Bx = d is known to have 
the following general solution: 


1 —3 1 
x= +s : +t s,tER 
2 0 —l , 
0 0 1 
1 3 
Let cı = (1) be the first column of B. If d= | 5 ) find the 
2: —2 


matrix B. 


Exercise 4.5 Consider the matrix 


1 2 1 
a=(2 3 o). 
3 5s il 


Find a condition that the components of the vector b = (a, b, c)! must 
satisfy in order for 4x = b to be consistent. Hence, or otherwise, show 
that R(A) is a plane in R?, and write down a Cartesian equation of this 
plane. 

Show that d = (1,5, 6)! is in R(A). Express d as a linear combi- 
nation of the columns of A. Is it possible to do this in two different 
ways? If the answer is yes, then do so; otherwise, justify why this is not 
possible. 


Exercise 4.6 Consider the matrices 


l1 1 1 a 3 2 5 4 
0 1 -2 3 -6 9 —6 1 
NG cuca. BOS age ye a. ogy He 
3 1 7 5 -6 9 —4 b 


(a) Find the rank of the matrix A. Find a general solution of Ax = 0. 
Either write down a non-trivial linear combination of the column 
vectors of A which is equal to the zero vector, 0, or justify why 
this is not possible. 

Find all real numbers a and b such that b € R(A), where b is 
the vector given above. Write down a general solution of Ax = b. 

(b) Using row operations, or otherwise, find |B |, where B is the matrix 

given above. What is the rank of B? 
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Either write down a non-trivial linear combination of the column 
vectors of B which is equal to the zero vector, 0, or justify why this is 
not possible. 

Find all real numbers a and b such that b € R(B), the range of B, 
where b is the vector given above. 


4.7 Problems 


Problem 4.1 Solve the following system of equations Ax = b by reduc- 
ing the augmented matrix to reduced row echelon form: 


xı — x2 + x3 + x4 + 2x5 = 4 
—x1 +x. + x4 — X5 = —3 
x1 — x2 + 2x3 + 3x4 + 4x5 = 7. 


Show that your solution can be written in the form x = p + sv, + tv2 
where Ap = b, Avı = 0 and Av, = 0. 


Problem 4.2 Express the following system of linear equations in matrix 
form, as Ax = b: 
x+ty+z+w=3 
y—2z+2w=l 
x+3z-w=2. 


Find the general solution. 


(a) Determine N(A), the null space of 4. 

(b) Ifa=(a,b,c)', find an equation which a, b,c must satisfy so 
that a € R(A), the range of A. 

(c) If d=(1,5, 3), determine if the system of equations Ax = d is 
consistent, and write down the general solution if it is. 


Problem 4.3 Show that the following system of equations is consistent 
for any c € R: 


x+y—2z=1 
2x—y+2z=1 
cx+z=0. 


Solve the system using any matrix method (Cramer’s rule, inverse 
matrix, or Gaussian elimination) and hence write down expressions 
for x, y, z in terms of c. 
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Problem 4.4 Consider the following system of equations, where A is a 
constant: 
2x+y+z=3 
x-—yt2z=3 
x—2y+ìz=4. 
Determine all values of A, if any, such that this system has: 
(1) no solutions; 


(2) exactly one solution; 
(3) infinitely many solutions. 


In case (2) find the solution using either Cramer’s rule or an inverse 
matrix. In case (3) solve the system using Gaussian elimination. Express 
the solution in vector form. 


Problem 4.5 Solve the system of equations Bx = b using Gaussian 
elimination, where 


1 1 2 1 3 
s=(2 5 1 i| »=(2) 
1 0 7 2 —2 


Show that the vector b cannot be expressed as a linear combination 
of the columns of B. 


Problem 4.6 A system of linear equations Ax = d is known to have 
the following solution: 


1 2 1 
2 1 1 
x=| 0 |+s|1|+zżz:| 0], steR 
—1 0 —1 
0 0 1 
Assume that A is an m x n matrix. Let cj, co, ..., ¢, denote the 


columns of A. 
Answer each of the following questions or, if there is insufficient 
information to answer a question, say so. 


(1) What number is n? 

(2) What number is m? 

(3) What (number) is the rank of A? 

(4) Describe the null space N(A). 

(5) Write down an expression for d as a linear combination of the 
columns of A. 
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(6) Write down a non-trivial linear combination of the columns c; 
which is equal to 0. 


Problem 4.7 Let 


3 ee: ee | 11 

1 0 1 2 -=l 5 
aa Ak o ae oe Meee 

1 13 5 0 4 


Solve the system of equations, Ax = b, using Gaussian elimination. 
Express your solution in vector form, as x = p + a1Vı +- -- + akVk, 
and verify that k = n — r where r is the rank of A. What is n? 

If possible, express b as a linear combination of the column vectors 
of A in two different ways. 


5 


Vector spaces 


In this chapter, we study the important theoretical concept of a vector 
space. This, and the related concepts to be explored in the subsequent 
chapters, will enable us to extend and to understand more deeply what 
we’ve already learned about matrices and linear equations, and lead us 
to new and important ways to apply linear algebra. There is, necessarily, 
a bit of a step upwards in the level of ‘abstraction’, but it is worth the 
effort in order to help our fundamental understanding. 


5.1 Vector spaces 


5.1.1 Definition of a vector space 


We know that vectors of R” can be added together and that they can 
be ‘scaled’ by real numbers. That is, for every x, y € R” and every 
a € R, it makes sense to talk about x + y and ax. Furthermore, these 
operations of addition and multiplication by a scalar (that is, multi- 
plication by a real number) behave and interact ‘sensibly’ in that, for 
example, 


a(x +y) = QX + ay, 
a(Bx) = (aB)x, 
x+y=y+X, 


and so on. 

But it is not only vectors in R” that can be added and multiplied 
by scalars. There are other sets of objects for which this is possible. 
Consider the set F of all functions from R to R. Then any two of these 
functions can be added; given f, g € F, we simply define the function 
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f +g by 
(f + g(x) = f(x) + g(x). 


Also, for any a € R, the function af is given by 


(af (x) = af (x). 


These operations of addition and scalar multiplication are sometimes 
known as pointwise addition and pointwise scalar multiplication. This 
might seem a bit abstract, but think about what the functions x + x? 
and 2x represent: the former is the function x plus the function x”, and 
the latter is the function x multiplied by the scalar 2. So this is just 
a different way of looking at something with which you are already 
familiar. It turns out that F and its rules for addition and multiplication 
by ascalar satisfy the same key properties as the set of vectors in R” with 
its addition and scalar multiplication. We refer to a set with an addition 
and scalar multiplication which behave appropriately as a vector space. 
We now give the formal definition of a vector space. 


Definition 5.1 (Vector space) A (real) vector space V is a non-empty 
set equipped with an addition operation and a scalar multiplication 
operation such that for alla, 6 € Randallu,v,we V: 


u+veéeV (closure under addition). 
u+v=v+u_ (the commutative law for addition). 
u+(v+w)=(u+v)+w (the associative law for addition). 
there is a single member 0 of V, called the zero vector, such that 
forallve V,v+0 =v. 
for every v € V there is an element w € V (usually written as 
—v), called the negative of v, such that v + w = 0. 

6. aveV (closure under scalar multiplication). 

7. a(u+v)=au+av_ (distributive law). 

8. (a+ B)v=av+ Bv_ (distributive law). 

9. a(Bv)=(aB)v (associative law for scalar multiplication). 
10. Ilv=v. 


ano ae a 


nN 


This list of properties, called axioms, in the definition is the shortest 
possible number which will enable any vector space V to behave the 
way we would like it to behave with respect to addition and scalar 
multiplication; that is, like R”. Other properties which we would expect 
to be true follow from those listed in the definition. For instance, we 
can see that Ox = 0 for all x, as follows: 


0x = (0+ 0)x = Ox + 0x 
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by axiom 8; so, adding the negative —Ox of 0x to each side, 


0 = Ox + (—0x) = (Ox + Ox) + (—Ox) = Ox + (0x + (—0x)) 
= 0x + 0 = 0x 


by axioms 5, 3, 5 again, and 4. The proof may seem a bit contrived, but 
just remember the result: 


Ox = 0. 


This would be easy to show in IR” because we know what the vector 0 
looks like, namely 0 = (0, 0, . . . , 0)'. But because we want to show it 
is true in any vector space, V, we have to derive this property directly 
from the definition. Once we’ve established a result, we can use it to 
prove other properties which hold in a vector space V. 


Activity 5.2 Prove that for any vector x in a vector space V, 
(-—1)x = —x, 


the negative of the vector x, using a similar argument with 0 = 1 + (—1). 
If you’re feeling confident, show that «0 = 0 for anya € R. 


Note that the definition of a vector space says nothing at all about 
multiplying together two vectors, or an inner product. The only oper- 
ations with which the definition is concerned are addition and scalar 
multiplication. 

A vector space as we have defined it is called a real vector space, to 
emphasise that the ‘scalars’ a, 6 and so on are real numbers rather 
than (say) complex numbers. There is a notion of complex vector 
space, where the scalars are complex numbers, which we shall cover in 
Chapter 13. 

In the discussions that follow, be aware of whether we are talking 
about R” or about an abstract vector space V. 


5.1.2 Examples 


Example 5.3 The set R” is a vector space with the usual way of adding 
and scalar multiplying vectors. 


Example 5.4 The set V = {0} consisting only of the zero vector is a 
vector space, with addition defined by 0+ 0 = 0, and scalar multipli- 
cation defined by «0 = 0 for alla € R. 
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Example 5.5 The set F of functions from R to R with pointwise 
addition and scalar multiplication (described earlier in this section) is a 
vector space. Note that the zero vector in this space is the function that 
maps every real number to 0 — that is, the identically zero function. 


Activity 5.6 Show that all 10 axioms of a vector space are satisfied. In 
particular, if the function f is a vector in this space, what is the vector 


=f 


Example 5.7 The set of m x n matrices with real entries is a vector 
space, with the usual addition and scalar multiplication of matrices. The 
‘zero vector’ in this vector space is the zero m x n matrix, which has 
all entries equal to 0. 


Example 5.8 The set S of all infinite sequences of real numbers, 
y = {V1, y2, ..-, Yn, ---}, yi € R, is a vector space. We can also use 
the notation y = {y,}, n > 1 for a sequence. For example, the sequence 
y = {1,2,4,8,16,32,...} can also be represented as {y,} with 
Bea eee |e ns en 

The operation of addition of sequences is by adding components. 
Ify,zeS, 


VY = (V1, V2,--+s¥ns---$, Z = A 22, ---5Zn5-- fs 


then 


y +Z = {y1 +21, V2 + Z2, ..., Yn +2n,---}. 


Multiplication by a scalar œ € R is defined in a similar way, by 


ay = {ay}, AY2, ---, ÆAYn, -> -}. 


These operations satisfy all the requirements for S to be a vector space. 
The sum and scalar multiple of an infinite sequence as defined above 
is again an infinite sequence. The zero vector is the sequence consist- 
ing entirely of zeros, and the negative of y = {y,} is —y = {—yn}. 
The remaining axioms are satisfied because the components of a 
sequence are real numbers. For example, using the notation y = {yn}, 
Z= {Zn},n = l, 


Yy +Z = {Yn + Zn} = {Zn + yn} =Z +y. 


Activity 5.9 If it is not immediately clear to you that all ten axioms are 
satisfied, then try to write down proofs for some of them. 


The following example concerns a subset of R°. 
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Example 5.10 Let W be the set of all vectors in R? with the third entry 
equal to 0; that is, 
x,y E€ | : 


Then W is a vector space with the usual addition and scalar multiplica- 
tion. To verify this, we need only check that W is non-empty and closed 
under addition and scalar multiplication. Why is this so? The axioms 2, 
3, 7, 8, 9, 10 will hold for vectors in W because they hold for all vectors 
in R3, and if W is closed under addition and scalar multiplication, then 
all linear combinations: of vectors in W are stillin W. Furthermore, if 
we can show that W is closed under scalar multiplication, then for any 
particular v € W, Ov = 0 € W and (—1)v = —v € W. So we simply 
need to check that W Æ Ø (W is non-empty), that if u, v € W, then 
u+veéW, andifa eR and v €e W, then av € W. Each of these is 
easy to check. 


Activity 5.11 Verify that W Æ Ø, and that for u,v € W anda € R, 
u+veWandave W. 


5.1.3 Linear combinations 


For vectors v1, V2,..., Vz in a vector space V, the vector 
vV = QV, + Q2V2 +--+: + Avy 


is known as a linear combination of the vectors Vj, V2,..., Vg. The 
scalars a; are called coefficients. The structure of a vector space is 
designed for us to work with linear combinations of vectors. 


Example 5.12 Suppose we want to express the vector w = (2, —5)! 
in R? as a linear combination of the vectors vı =(1,2)' and 
v2 = (1, —1)'. Then we solve the system of linear equations given 
by the components of the vector equation 


Coa ley 


to obtain œ = —1 and 8 = 3. Then w = —v, + 3v2, which is easily 
checked by performing the scalar multiplication and addition: 


eme ta) 
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Activity 5.13 On a graph, sketch the vectors vı and v2 and then sketch 
the vector w as a linear combination of these. Sketch also the vector 
x= ad + v2. Do you think you can reach any point on your piece of 
paper as a linear combination of vı and v2? 


Example 5.14 If F is the vector space of functions from R to R, 
then the function f : x > 2x? + 3x +4 can be expressed as a lin- 
ear combination of three simpler functions, f = 2g + 3h + 4k, where 
gixex,hixexandk:xp 1. 


5.2 Subspaces 
5.2.1 Definition of a subspace 


Example 5.10 is informative. Arguing as we did there, if V is a vector 
space and W C V is non-empty and closed under scalar multiplication 
and addition, then W too is a vector space (and we do not need to verify 
that all the other axioms hold). The formal definition of a subspace is 
as follows: 


Definition 5.15 (Subspace) A subspace W of a vector space V is a 
non-empty subset of V that is itself a vector space under the same 
operations of addition and scalar multiplication as V. 


The discussion given in Example 5.10 justifies the following important 
result: 


Theorem 5.16 Suppose V is a vector space. Then a non-empty subset 
W of V is a subspace if and only if both the following hold: 


« foralluusyeW,u+veW 
(that is, W is closed under addition), 
«e forally € W anda eR, aye W 
(that is, W is closed under scalar multiplication). 


Activity 5.17 Write out a proof of this theorem, following the discus- 
sion in example 5.10. 


5.2.2 Examples 
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Example 5.18 In R?, the lines y = 2x and y = 2x + 1 can be defined 


as the sets of vectors, 


x 
S= @ =2x,x € 
1G) 
x 
U = (*) =2x+1xeE 
iG) 


A 


l, 


A 


k 


Each vector in one of the sets is the position vector of a point on that 


line. We will show that the set S' is a subspace of 


is not a subspace of R°. 


R?, and that the set U 


Ifv = ( : ) and p = ( i ) , these sets can equally well be expressed 


S={x|x=tv,t¢€ 


R}, 


U = {x|x=p+vy, t € R}. 


Activity 5.19 Show that the two descriptions of S describe the same 


set of vectors. 


To show S is a subspace, we need to show that it is non-empty, and we 
need to show that it is closed under addition and closed under scalar 
multiplication using any vectors in S and any scalar in 
second set of definitions, so our line is the set of vectors 


| 


S={x|x=tv,teé 


The set S is non-empty, since 0 = Ov € S. 
Let u, w be any vectors in S and let a € 


= l =t l f te 
u=s 2 w= 2 or some sS, 


e closure under addition: 


u+w=s(3)+(3) =6+0(,) € S (since s +t €R). 


R}, 


e closure under scalar multiplication: 


au=a (s P = (as) (3) eS (since as € 


This shows that S is a subspace of 


R?. 


tl 
pan es) 


R. Then 


R. We’ll use the 


om 


A 


). 


To show U is not a subspace, any one of the three following statements 


(counterexamples) will suffice: 
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1. O¢€¢U. 
2. U isnot closed under addition: 


(i) eu. (4) eu. but (G) G) eu, 


since 4 Æ 2(1)+ 1. 
3. U is not closed under scalar multiplication: 


(Jeu, rer, wt 2(*)=(2) eu. 


Activity 5.20 Show that 0 ¢ U. Why does this suffice to show that U 
is not a subspace? 


The line y = 2x + 1 is an example of an affine subset, a ‘translation’ 
of a subspace. 

It is useful to visualise what is happening here by looking at the 
graphs of the lines y = 2x and y = 2x + 1. Sketch y = 2x and sketch 
the position vector of any point on the line. You will find that the 
vector lies along the line, so any scalar multiple of that position vector 
will also lie along the line, as will the sum of any two such position 
vectors. These position vectors are all still in the set S. Now sketch the 
line y = 2x + 1. First, notice that it does not contain the origin. Now 
sketch the position vector of any point on the line. You will find that 
the position vector does not lie along the line, but goes from the origin 
up to the point on the line. If you scalar multiply this vector by any 
constant a Æ 1, it will be the position vector of a point which is not on 
the line, so the resulting vector will not be in U. The same is true if you 
add together the position vectors of two points on the line. So U is not 
a subspace. 


Activity 5.21 Do these two sketches as described above. 
If V is any vector space and v € V, then the set 


S = {æav |æ € R} 


is a subspace of V. If v Æ 0, then the set S defines a line through the 
origin in V. 


Activity 5.22 Show this. Let v be any non-zero vector in a vector space 
V and show that the set 


S = {æav |æ e R} 


is a subspace. 
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Example 5.23 If V is a vector space, then V is a subspace of V. 


Example 5.24 If V is a vector space, then the set {0} is a subspace 
of V. The set {0} is not empty, it contains one vector, namely the zero 
vector. It is a subspace because 0 + 0 = 0 and a0 = 0 for anya € R. 


5.2.3 Deciding if a subset is a subspace 


Given any subset S of a vector space V, how do you decide if it is 
a subspace? First, look carefully at the definition of S: what is the 
requirement for a vector in V to be in the subset S? Check that 0 € S. 
If 0 ¢ S, then you know immediately that S is not a subspace. 

If 0 € S, then using some vectors in the subset, see if adding them 
and scalar multiplying them will give you another vector in S. 

To prove that S is a subspace, you will need to verify that it is 
closed under addition and closed under scalar multiplication for any 
vectors in S. (To represent a general vector in R”, you will need to use 
letters to represent the vector and possibly its components.) You will 
need to show that the sum of two general vectors and the multiple of 
a general vector by any scalar, say a € R, also satisfy the definition 
of S. 

To prove a set S is not a subspace, you only need to find one 
counterexample: either two particular vectors for which the sum does 
not satisfy the definition of S, or a vector for which some scalar multiple 
does not satisfy the definition of S. (For a particular vector in R”, use 
numbers.) 


Activity 5.25 Write down a general vector (using letters) and a par- 
ticular vector (using numbers) for each of the following subsets. Show 
that one of the sets is a subspace of R? and the other is not: 


(re G) 


There is an alternative characterisation of a subspace. We have seen that 
a subspace is a non-empty subset W of a vector space that is closed 
under addition and scalar multiplication, meaning that if u, v € W and 
a € R, then both u + v and av are in W. Now, it is fairly easy to see 
that the following equivalent property characterises when W will be a 
subspace: 


A 


xE 


Theorem 5.26 A non-empty subset W ofa vector space is a subspace 
if and only if for allu, v € W andall a, B € R, we have au + Bv € W. 
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That is, W is a subspace if it is non-empty and closed under linear 
combination. 


Activity 5.27 If it is not already obvious to you, show that the prop- 
erty given above is equivalent to closure under addition and scalar 
multiplication. 


To summarise, here is how you would prove that a subset W of a vector 
space is a subspace: 


e Prove that W is non-empty. Usually the easiest way is to show that 
Oc W. 

e Prove that W is closed under addition: if u, v are any vectors in W, 
thenu+ve W. 

e Prove that W is closed under scalar multiplication: if v is any vector 
in W and @ is any real number, then av € W. 


Alternatively, by Theorem 5.26, you could do the following: 


e Prove that W is non-empty. Usually the easiest way is to show that 
Oc W. 

e Prove that W is closed under linear combinations: if u, v are any 
vectors in W, and a, 6 are any real numbers, then au + fv € W. 


In doing either of these, your arguments have to be general: you need 
u, v, œ (and £, in the second approach) to be arbitrary. Simply showing 
these statements for some particular vectors or numbers is not enough. 
On the other hand, if you want to show that a set is not a subspace, then 
as we’ve noted above, it suffices to show how some of these properties 
fail for particular choices of vectors or scalars. 


5.2.4 Null space and range of a matrix 
Suppose that A is an m x n matrix. Then the null space N(A), the set 


of solutions to the homogeneous linear system Ax = 0, is a subspace 
of R”. 


Theorem 5.28 For any m x n matrix A, N(A) is a subspace of R”. 


Proof: Since A ism x n, the set N(A) is a subset of R”. To prove it is a 
subspace, we have to verify that N(A) Æ Ø, and that ifu, v € N(A) and 
a € R, then u + v € N(A) and au € N(A). Since 40 = 0, 0 € N(A) 
and hence N(A) 4 Ø. Suppose u, v € N(A). Then, to show u + v € 
N(A) and au € N(A), we must show that u + v and au are solutions 
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of Ax = 0. We have 
A(u+v)= 4u+ Av=0+0=0 
and 


A(au) = a(Au) = g0 = 0, 


so we have shown what was needed. 


The null space is the set of solutions to the homogeneous linear sys- 
tem. If we instead consider the set of solutions S to a general system 
Ax = b, S is not a subspace of R” if b Æ 0 (that is, if the system is not 
homogeneous). This is because 0 does not belong to S. However, as we 
saw in Chapter 2 (Theorem 2.29), there is a relationship between S and 
N(A): if Xo is any solution of Ax = b, then 


S = {xo +z | z € N(A)}, 


which we may write as xy + N(A). S is an affine subset, a translation 
of the subspace N(A). 


Definition 5.29 (Affine subset) If W is a subspace of a vector space V 
and x € V, then the set x + W defined by 


x+W={x+wlilwe W} 
is said to be an affine subset of V. 


In general, an affine subset is not a subspace, although every subspace 
is an affine subset, as we can see by taking x = 0. 


Recall that the range of an m x n matrix is 


R(A) = {Ax | x € R"}. 


Theorem 5.30 For any m x n matrix A, R(A) is a subspace of R". 


Proof. Since A is m x n, the set R(A) consists of m x 1 vectors, so 
it is a subset of R”. It is non-empty since 40 = 0 € R(A). We need 
to show that if u, v € R(A), then u + v € R(A), and for any a € R, 
av € R(A). So suppose u, v € R(A). Then for some yj, y2 € R”, u = 
Ayı, V = Ay2. We need to show that u + v = Ay for some y. Well, 


u +v = Ay; + Ayı = A(yi + y2), 
so we may take y = yı + y2 to see that, indeed, u + v € R(A). Next, 
av = a(Ay,) = A(ay), 


soav = Ay for some y (namely, y = ay,)andhenceav € R(A). There- 
fore, R(A) is a subspace of R”. 
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5.3 Linear span 


Recall that by a linear combination of vectors v1, V2, ..., Vk we Mean 
a vector of the form 


V = QV, + Q2V2 + +--+ Qvi, 


for some constants œ; € R. If we add together two vectors of this form, 
we get another linear combination of the vectors vj, V2,..., Vg. The 
same is true of any scalar multiple of v. 


Activity 5.31 Show this; show that if v = av, + Q@2V2 + - - - + Vx 
and w = fv; + Povo +--- + Bevy, then v + w and sv, s € R, are also 
linear combinations of the vectors v1, V2, ..., Vg. 


The set of all linear combinations of a given set of vectors of a vector 
space V forms a subspace, and we give it a special name. 


Definition 5.32 (Linear span) Suppose that V is a vector space and 
that v1, Vo,..., V E€ V. The linear span of X = {vj,..., Vg} is the set 
of all linear combinations of the vectors v1, ..., Vg, denoted by Lin(Y) 
or Lin{v;, V2,..., Vg}. That is, 


Lin{vi, V2,..., Ve} = {1V +++ +V | 1, Q2,...,a% E R}. 


Theorem 5.33 /f X = {vj,..., Vz} is a set of vectors of a vector space 
V, then Lin(X) is a subspace of V . It is the smallest subspace containing 
the vectors V1, V2, ..., Vk. 


Proof: The set Lin(X) is non-empty, since 
0 = Ov; +--+ + Ov, € LinCX). 


If you have carefully carried out Activity 5.31 above, then you have 
shown that Lin(X) is closed under addition and scalar multiplication. 
Therefore, it is a subspace of V. Furthermore, any vector space which 
contains the vectors V1, V2,..., Vz must also contain all linear combi- 
nations of these vectors, so it must contain Lin(X). That is, Lin(X) is 
the smallest subspace of V containing v1, V2, ..., Vk. 


The subspace Lin(X) is also known as the subspace spanned by the 


set X = {vj,..., Vx}, or, simply, as the span of {v1,Vo,..., Vx}. If 
V = Lin(X), then we say that the set {v,, V2, ..., Vx} spans the vector 
space V. 

If we know that a set of vectors {v}, V2, ..., Vg} Spans a vector space 
V, then we know that any vector w € V can be expressed in some way 
as a linear combination of the vectors v1, V2, ..., Vx. This gives us a lot 


of information about the vector space V. 
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5.3.1 Row space and column space of a matrix 


If Aisanm x n matrix, then the columns of A are vectors in R” and the 
rows of A are row vectors, | x n matrices. When written as a column — 
that is, when transposed — a row gives ann x 1 matrix; that is, a vector 
in R”. (Recall, from Chapter 1, that, by a vector, we mean a column 
vector.) We define the row space of A to be the linear span of its rows, 
when written as vectors, and the column space to be the linear span of 
its columns. 


Definition 5.34 (Column space) If 4 is an m xn matrix, and if 
C1, C2,...,€, denote the columns of A, then the column space of A, 
CS(A), is 


CS(A) = Lin{ey, 2, ..., en}. 


The column space of an m x n matrix A is a subspace of R”. 


Definition 5.35 (Row space) If A is an m xn matrix, and if 
ri, r2,...,r„ denote the rows of A written as vectors, then the row 
space of A, RS(A), is 


RS(A) = Lin{ri, r2, ..., fm}. 


The row space ofan m x n matrix A is a subspace of R”. 


We should just add a note of clarification. Our approach to the 
definition ofrow space is slightly different from that found in some other 
texts. It is perfectly valid to think of the set of row vectors, by which 
we mean | x n matrices, as a vector space in an entirely analogous 
way to IR”, with the corresponding addition and scalar multiplication. 
This is simply a different ‘version’ of R”, populated by row vectors 
rather than column vectors. Then the row space could be defined as the 
linear span of the rows of the matrix, and is a subspace of this vector 
space. However, for our purposes, we prefer not to have to work with 
two versions of R”, and nor do we want (as some are content to do) to 
make no distinction between rows and columns. It is because we want 
the row space to be a subspace of Euclidean space as we understand it 
(which entails working with column vectors) that we have defined row 
space in the way we have. 

In the previous chapter, we observed that the range, R(A), of an 
m Xn matrix A is equal to the set of all linear combinations of its 
columns. (See Section 4.3.) That is, R(A) is equal to the linear span of 
the columns of A, so 


R(A) = CS(A). 
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Therefore, the range of A and the column space of A are precisely the 
same subspace of R”, although their original definitions are different. 

On the other hand, the row space of A is a subspace of R”. Although 
the notations are similar, it is important not to confuse the row space of 
a matrix, RS(A), with the range of a matrix R(A) = CS(A). 

We have seen one more vector space associated with a matrix, 
namely the null space, N(A), which is also a subspace of R”. 

Recall that two vectors in R” are orthogonal if and only if their 
inner product is equal to 0. The null space of a matrix A and the row 
space of A are orthogonal subspaces of R”, meaning that every vector 
in the null space is orthogonal to every vector in the row space. Why is 
this true? A vector x is in N(A) if and only if Ax = 0. Look at the ith 
component of the product Ax. This is just the inner product of r; with 
x, where r; is the ith row of A written as a vector. But, since Ax = 0, it 
must be true that (r;, x) = 0 for each i. Since any r € RS(A) is some 
linear combination of the spanning vectors r1, . . . , m, the inner product 
(r, x) equals zero for any r € RS(A) and any x € N(A). We restate this 
important fact: 


« IfAisanm x n matrix, then for any r € RS(A) and any x € N(A), 
(r, x) = 0; that is, r and x are orthogonal. 


Activity 5.36 Show that if {r,,1r2,..., rm} is any set of vectors in R”, 
and x € IR” is such that (r;,x) = 0 fori = 1,...,m, then (r, x) =0 
for any linear combination r = arı + a2r2 +--+ + amťm.- 


5.3.2 Lines and planes in R° 


What is the set Lin{v}, the linear span of a single non-zero vector 
v € R”? Since the set is defined by 


Lin{v} = {av | a € R}, 


we have already seen that Lin{v} defines a line through the origin in 
R”. In fact, in Activity 5.22 you proved directly that this is a subspace 
for any vector space, V. 

In Chapter | (Section 1.11), we saw that a plane in R? can be 
defined either as the set of all vectors x = (x, y, z)! whose components 
satisfy a single Cartesian equation, ax + by + cz = d, or as the set 
of all vectors x which satisfy a vector equation with two parameters, 
x = p+sv+tw,s,t € R, where v and ware non-parallel vectors and 
p is the position vector of a point on the plane. These definitions are 
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equivalent, as it is possible to go from one representation of a given 
plane to the other. 

If d = 0, the plane contains the origin; so, taking p = 0, a plane 
through the origin is the set of vectors 


{x | x =sv+tw, s,t € R}. 


Since this is the linear span, Lin{v, w}, of two vectors in R?, a plane 
through the origin is a subspace of R3. 
Let’s look at a specific example. 


Example 5.37 Let S be the set given by 


(E) poe) 


Then for x € S, 


That is, S can be expressed as the set 


S= {x | x=sv,; +tvo, s,t € R}, 


where vı, V2 are the vectors vı = (1, 0, —3)', v2 = (0, 1, 2)". Since S$ 
is the linear span of two vectors, it is a subspace of R3. Of course, you 
can show directly that S is a subspace by showing it is non-empty, and 
closed under addition and scalar multiplication. 


If d Æ 0, then the plane is not a subspace. It is an affine subset, a 
translation of a subspace. 


Activity 5.38 Show this in general, as follows. If a, b, c are real num- 
bers, not all zero, show that the set 


git 


is a subspace if d = 0 by showing that S is non-empty and that it is 
closed under addition and scalar multiplication. Show, however, that if 
d Æ 0, the set S is not a subspace. 


thy teem 
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In the same way as for planes in R3, any hyperplane in R” which 
contains the origin is a subspace of IR”. You can show this directly, 
exactly as in the activity above, or you can show it is the linear span of 
n — | vectors in R”. 


5.4 Learning outcomes 


You should now be able to: 


e explain what is meant by a vector space and a subspace 

e prove that a given set is a vector space 

e decide whether or not a subset of a vector space is a subspace 

e prove that a subset is a subspace or show by a counterexample that 
it is not a subspace 

e explain what is meant by the linear span of a set of vectors 

e explain what is meant by the column space and the row space of a 
matrix 

e explain why the range of a matrix is equal to the column space of 
the matrix 

e explain why the row space of a matrix is orthogonal to the null space 
of the matrix. 


5.5 Comments on activities 


Activity 5.2 For any x, 
0 = 0x =(14+(-1))x = 1x + (—1)x = x + (—1)x, 


so adding the negative —x of x to each side, and using axioms 3 and 4 
of the definition of a vector space, 


—x = —Xx + 0 = —x + x + (—1)x = (—1)x, 


which proves that —x = (—1)x. 
To show that œ«0 = 0 for any a € R, let u denote any vector in V. 
Then 


œa0 + œu = &(0 + u) = gu. 


Why? This follows from axioms 7 and 4 of the definition of a vector 
space. Now add —au to both sides of the equation, to obtain the result 
that 70 = 0. Which axioms are you using to deduce this? 
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Activity 5.6 The axioms are not hard to check. For example, to check 
axiom 2, let f, g € F. Then the function f + g is given by 


(f + g(x) = f(x) + g(x) = 9) + f(x) = (g + A), 


since real numbers commute. But this means f + g = g + f. We omit 
the details of the other axioms; they are all straightforward and follow 
from the properties of real numbers. The negative of a function f is the 
function — f given by (— f)(x) = —(f(x)) for all x. 


Activity 5.9 Just write out each sequence in the shorter form, y = {yn} 
and use the properties of real numbers. 


Activity 5.11 Clearly, W 4 Ø since 0 € W. Suppose 


and that a € R. Then 


x+x’ ax 
utv= (4r) EW and av= [o> EW, 


0 0 


as required. 


Activity 5.13 Do the sketches as instructed. Yes, you can reach any 
point in R? as a linear combination of these vectors. Why? Because 
you can always solve the system of linear equations resulting from the 
vector equation x = av; + v2 for a and £ (since the determinant of 
the coefficient matrix is non-zero). 


Activity 5.17 Since this is an ‘if and only if’ statement, you must prove 
it both ways. 

If W is a subspace, then certainly it is closed under addition and 
scalar multiplication. 

Now suppose that W is a non-empty subset of a vector space V, 
which is closed under the addition and scalar multiplication defined on 
V, so that axioms | and 6 are satisfied for W under these operations. 
W is non-empty, so there is a vector v € W. Since W is closed under 
scalar multiplication, then also 0 = Ov € W and (—1)v =v € W for 
any v € W. The remainder of the axioms are satisfied in W since they are 
true in V, and any vector in W is also in V (and any linear combination 
of vectors in W remains in W). 
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Activity 5.19 This follows from 


aa a) oy 
w] N2) u y=2t — 
Activity 5.20 The vector 0 is not in the set U as 


0=(5)4 (1) +3) for any t € 


so axiom 4 of the definition of a vector space is not satisfied. 


A 


c] 


Activity 5.22 Note first that S is non-empty because 0 € S. Suppose 
that x, y € S. (Why are we carefully not using the usual symbols u and 
v? It is because v is representing a particular vector and is used in the 
definition of the set S.) Suppose also that 6 € R. Now, because x and y 
belong to S, there are a, œ” € R such that x = av and y = a’Vv. Then, 


x+y =Qv +v = (+a, 
which is in S since it is a scalar multiple of v. Also, 
Px = B(av) = (Bayve S 


and it follows that S is a subspace. 


a 


Activity 5.25 A general vector in S; is of the form | a? |, a € R, and 


l 0 
one particular vector, taking x = 1, is | 1 . A general vector in S) is 
a 0 
of the form (2 ) , a € R, and one particular vector, taking x = 1, is 


() 


Each of these subsets contains the zero vector, 0. 

The set Sı is not a subspace. To show this, you need to find one 
counterexample, one or two particular vectors in Sı which do not satisfy 
the closure properties. For example, 


(Jes 0-0 


The set S2 is a subspace. You need to show it is closed under addition 
and scalar multiplication using general vectors. Let u, v € $, a € R. 


5.5 Comments on activities 


a b 
u= (2a) and v= (2) , forsomea,b € 
0 0 


Taking the sum and scalar multiple, 


a b a+b 
utv= (2) + (2) = (x40) E€ S$, and 
0 0 0 


aa 
au = [2 € So, 


Then 


m 


0 


which proves that Sz is a subspace. 
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Activity 5.27 Let u,v € W. If au+ v € W for all a, 6 € R, then 
taking, first, œ = 6 = 1 and, second, 6 = 0, we have u +v € W and 
au € W for all a € R. Conversely, if W is closed under addition 
and scalar multiplication, then (by closure under scalar multiplication) 
au € W and Bv € W forall a, 6 € R, and it follows, by closure under 


addition, that «u + Pv € W. 


Activity 5.31 Any two such vectors will be of the form 
V = 1V1 + 2V2 +--+ + OVE 
and 
v = a) Vi +V +: H AVe 
and we will have 
v +v = (a1 +a})vi + (a2 + &5)v2 + -+ + (aK + æ; )Vk, 
which is a linear combination of the vectors v1, V2, ..., Vz. Also, 


AV = (1V1 + 2V2 +--+ + aK Vy) 
= (a1 )V1 + (&a2)V2 + +++ + (MOK) VE 


is a linear combination of the vectors V1, V2, ..., Vx. 


Activity 5.36 Using properties of the inner product of two vectors in 


R”, 
(r, x) = (arı tax +: + al ni X) 


= ay (rı, X) + a2{¥2, X) + aa + Gn (Vm, X). 


Since (r;, x) = 0 for each vector r;, we can conclude that also (r, x) = 0. 
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Activity 5.38 It is easy to see that S Æ Ø in either case: just list one 
vector in each set. For example, (0, 0, 0)' € S ifd = 0, and, assuming 
a £0, (d/a,0,0) € Sifd 4 0 (or even if d = 0). 


R. Then 


Suppose d = 0. Let u, v € S anda € 


where ax + by + cz =Q and ax’ + by’ + cz' = 0. Consider u + vV. 


This equals 
x +x’ X 
z+z Z 


and we want to show this belongs to S. Now, this is the case, because 
aX+bY+cZ=a(x+x')+bd(yt+y)+c(z +2’) 
= (ax + by + cz) + (ax' + by’ + cz’) 
=0+0 
= 0, 


and, similarly, it can be shown that for any a € R, av € S. So in this 
case S is closed under addition and scalar multiplication and is therefore 
a subspace. 

If d Æ 0, the simple statement that 0 does not satisfy the equation 
means that in this case S is not a subspace. 

(However, you can see why closure fails when d is not 0; for then, 
choosing any two particular vectors for u and v, ifu + v = (X, Y, Z)', 
thenaX + bY + cZ will equal 2d, which will not be the same as d. So 
we will not have u + v € S. Similarly, we can see that av will not be in 
Sifa #1.) 


5.6 Exercises 


Exercise 5.1 Which of the following sets are subspaces of R?? 


(0) 
(0) 


ora} (C 
vx} afl 


seymar, 


weno}, 


Provide proofs or counterexamples to justify your answers. 
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Exercise 5.2 Suppose 4 is an n xn matrix and à € R is a fixed 
constant. Show that the set 


S= {x | Ax = Ax} 


is a subspace of R”. 


Exercise 5.3 Consider the following vectors 


vC) 0:6) -0 


(a) Show that u can be expressed as a linear combination of vı and 
v2, and write down the linear combination; but that w cannot be 
expressed as a linear combination of vı and vo. 

(b) What subspace of R? is given by Lin{v;, v2, u}? What subspace 
of R? is given by Lin{v1, v2, w}? 

(c) Show that the set {v;, v2, u, w} spans R?. Show also that any vector 
b € R? can be expressed as a linear combination of v;, v2, u, w in 
infinitely many ways. 


Exercise 5.4 If v, w € R”, explain the difference between the sets 


A= {v,w} and B= Lin{v, w}. 


Exercise 5.5 Let F be the vector space of all functions from R > R 
with pointwise addition and scalar multiplication. Let n be a fixed 
positive integer and let P, be the set of all real polynomial functions of 
degree at most n; that is, P, consists of all functions of the form 


f(x) = a9 + ax + ax? +--+ a,x", where ao, a1,...,d, E€ R. 


Prove that P,, is a subspace of F, under the usual pointwise addition 
and scalar multiplication for real functions. Find a finite set of functions 
which spans P,. 


Exercise 5.6 Let U and W be subspaces of a vector space V. 


(a) Show that U N W isa subspace of V. 
(b) Showthat U U W isnotasubspace of V unless U C WorW CU. 


Recall the intersection of the two sets U and W is defined as 
UNW={x:xeUandxe W} 
and that the union of the two sets U and W is defined as 


UUW={x:xeUorxe W}. 
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Give a simple example of sets U and W in R? illustrating that U N W 
is a Subspace, but for which U U W is not. 


5.7 Problems 


Problem 5.1 Which of the following are subspaces of R*? 


sho ll 
c(h} E 
s{()b-e--{8) oe} 


Provide proofs or counterexamples to justify your answers. Describe 
the sets geometrically. 


xX ty +z = i 


A 


ZE 


Problem 5.2 Suppose 


(3) =) 


Determine which of the vectors below are in Lin{u, v}, and for each 
such vector, express it as a linear combination of u and v: 


-G O AC) 


Problem 5.3 Let 


A 
O S 


Show that the set S; spans R?, but any vector v € R? can be written as 
a linear combination of the vectors in Sı in infinitely many ways. Show 
that Sy and S; do not span R?. 
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Problem 5.4 


(a) Solve the following equation to find the coefficients œ and f by 
finding A7!: 


(aJa E a)G)-« 
(b) Show that Lin{w,, w2} =in{ (5) , (a = R*. That is, 


show any vector b € R? can be expressed as a linear combination 
of w, and wp by solving b = Ax for x: 


Gea) 1) (G) = 


(c) Show, in general, that if v and w are non-zero vectors in R, with 
v = (a,c)! and w = (b, d)', then 


Lin{v, w} = R? —> v tw forany te R => 


a b 
C | #° 


Problem 5.5 Let F be the vector space of all functions from R > R 
with pointwise addition and scalar multiplication. (See Example 5.5.) 


(a) Which of the following sets are subspaces of F? 
Si={feF| fO0) =, S,={feF | fC) = 9}. 
(b) (For readers who have studied calculus) Show that the set 
S3 = {f € F | f is differentiable and f’ — f = 0} 


is a subspace of F. 


Problem 5.6 Let M2(R) denote the set of all 2 x 2 matrices with real 
entries. Show that M(IR) is a vector space under the usual matrix 
addition and scalar multiplication. 


Which of the following subsets are subspaces of Ma (R)? 


a 0 a l 
m={(4 5) aber}, m={(4 »)labe 


a 0 
m=i% pp) lade 


Justify your answers. 


A 


7 
‘n 
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Linear independence, bases 
and dimension 


In this chapter, we look into the structure of vector spaces, developing 
the concept of a basis. This will enable us to understand more about 
a given vector space, and know precisely what we mean by its dimen- 
sion. In particular, you should then have a clear understanding of the 
statement that IR” is an n-dimensional vector space. 


6.1 Linear independence 


Linear independence is a central idea in the theory of vector spaces. If 
{V1, V2, ..., Vg} is a set of vectors in a vector space V, then the vector 
equation 


QV, + O2V2 +- +H AV = 0 


always has the trivial solution, a} =a, =---=a,; = 0. 

If this is the only possible solution of the vector equation, then we 
say that the vectors v1, V2, ..., Vx are linearly independent. If there are 
numbers a1, Q2,..., Œg, not all zero, such that 


VV, + AV +--+ argv, = O, 
then the vectors are not linearly independent; we say they are linearly 
dependent. In this case, the left-hand side is termed a non-trivial linear 
combination of the vectors {V1, V2, ..., Vx}. 
So the vectors {v1, V2, ..., Vg} are linearly independent if no non- 
trivial linear combination of them is equal to the zero vector, or, equiv- 
alently, if whenever 


1X1 + 2X2 +--+ + QX = 9, 


then, necessarily, a} =a, =---=a,=0. 
We state the formal definitions now. 


6.1 Linear independence 173 


Definition 6.1 (Linear independence) Let V be a vector space and 
Vi,---, We E V. Then vj, Vo, ..., Vg are are linearly independent (or 
form a linearly independent set) if and only if the vector equation 


QV, HAV +- + anv, = 0 


has the unique solution 


Q1 = Q2 =--- =a, = 0; 


that is, if and only if no non-trivial linear combination of the vectors 
equals the zero vector. 


Definition 6.2 (Linear dependence) Let V be a vector space and 


Vi, V2,..., Vg E V. Then vi, v2,..., Vx are linearly dependent (or 
form a linearly dependent set) if and only if there are real numbers 
Qf, Q@,..., Œk, not all zero, such that 


AVi + O2V2 +--+ + avy, = 9; 


that is, if and only if some non-trivial linear combination of the vectors 
is equal to the zero vector. 


Example 6.3 In R’, the vectors 


=() = (4) 


are linearly independent. Why? Well, suppose we have a linear combi- 
nation of these vectors which is equal to the zero vector: 


(a) FP) 9): 


Then this vector equation holds if and only if 


a+p=0 
2a—B=0. 


This homogeneous linear system has only the trivial solution, a = 0, 
B = 0, so the vectors are linearly independent. 


Activity 6.4 Show that the vectors 


p=(44) at a= (2) 


are linearly dependent by writing down a non-trivial linear combination 
which is equal to the zero vector. 
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Example 6.5 In R?, the following vectors are linearly dependent: 


1 2 4 
3 5 11 


This is because 
2v +v —v3 = 0. 


(Check this!). 
Note that this can also be written as v3 = 2v; + Vo. 


This example illustrates the following general result. Try to prove it 
yourself before looking at the proof. 


Theorem 6.6 The set {v1, V2, ..., Vx} C V is linearly dependent if and 
only if at least one vector v; is a linear combination of the other vectors. 


Proof: Since this is an ‘if and only if’ statement, we must prove it both 
ways. If {v1, V2, ..., Vz} is linearly dependent, the equation 


QV; HAV +- +H AV = 0 


has a solution with some a; # 0. Then we can solve for the vector v;: 


a} a2 Qi-| i+] ak 
V = vi VI ere Vi-1 Vitl T° Vk, 
Qi Qi Qi Qi Qi 


which expresses v; as a linear combination of the other vectors in the 
set. 
If v; is a linear combination of the other vectors, say 


Vi = Bivi +--+ + Bi-1Vi-1 + Bi41Vigi +--+ + Breve, 


then 
Biv +--+ + Bi-1Vi-1 — Vi + Bi41Vi41 + +++ + Beve = 0 


is anon-trivial linear combination of the vectors that is equal to the zero 
vector, since the coefficient of v; is —1 4 0. Therefore, the vectors are 
linearly dependent. 


Theorem 6.6 has the following consequence. 


Corollary 6.7 Two vectors are linearly dependent if and only if at least 
one vector is a scalar multiple of the other. 


Example 6.8 The vectors 
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in the example above are linearly independent, since neither is a scalar 
multiple of the other. 


If V is any vector space, and {vj,V2,..., V} C V, then the set 
{V1, V2,.--, Vx, 0} is linearly dependent. Why? Well, we can write 


Ov; + 0v2 +---+0v, +a0=0, 


where a Æ 0 is any real number (for example, let a = 1). This is a 
non-trivial linear combination equal to the zero vector. Therefore, we 
have shown the following. 


Theorem 6.9 Jn a vector space V, a non-empty set of vectors which 
contains the zero vector is linearly dependent. 


6.1.1 Uniqueness of linear combinations 
There is an important property of linearly independent sets of vectors 
which holds for any vector space V. 


Theorem 6.10 Jf v1, V2, ..., Vm are linearly independent vectors in V 


and if 

Q1V1 + Q2V2 + +++ + AmVn = bv) + b2V2 +--+ + bmYm, 
then 

ai =b, a=b, ..., Aam = bm. 

Activity 6.11 Prove this. Use the fact that 

QV, baovo + e+) F AamYm = b1V1 + b2V2 +--+ + bmYm 
if and only if 

(a, — bi )vı + (a2 — b2)V2 +--+ + (am — Om )Vm = 0. 


What does this theorem say about x = cyvj + CoV2 +--+ +CmVin? 
(Pause for a moment and think about this before you continue reading.) 

It says that if a vector x can be expressed as a linear combination 
of linearly independent vectors, then this can be done in only one way: 
the linear combination is unique. 


6.1.2 Testing for linear independence in R” 


Given k vectors v),..., Vx € R”, the vector expression 


A1Vi + 2V2 +++ + AkVE 
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equals Ax, where A is the n x k matrix whose columns are the vectors 
V1, V2, ..., Vg and x is the vector, x = (a, @,..., œx)". (This is by 
Theorem 1.38.) So the equation 


QV, + Q2V2 +--+: + anv, = 0 


is equivalent to the matrix equation Ax = 0, which is a homogeneous 
system of n linear equations in k unknowns. Then, the question of 
whether or not a set of vectors in R” is linearly independent can 
be answered by looking at the solutions of the homogeneous system 
Ax = 0. We state this practical relationship as the following theorem: 


Theorem 6.12 The vectors vi, V2, ..., Vg in R” are linearly dependent 
if and only if the linear system Ax = 0 has a solution other than x = 0, 
where A is the matrix A = (Vi V2 --- Vx). Equivalently, the vectors are 


linearly independent precisely when the only solution to the system is 
x= 0. 


If the vectors are linearly dependent, then any solution x 40, 
x = (@,,..., a)! of the system Ax = 0 will directly give a non- 
trivial linear combination of the vectors that equals the zero vector, 
using the relationship that Ax = œ1V1 + @V2 +--+ + Vx. 


Example 6.13 The vectors 


w=(), w=(4). »=(3) 


are linearly dependent. To show this, and to find a linear dependence 
relationship, we solve Ax = 0 by reducing the coefficient matrix A to 
reduced row echelon form: 


gal i 2 ae, 
AD al 2S 01 3/7 


There is one non-leading variable, so the general solution is 


t 
x= (-] te 
t 


In particular, taking t = 1, and using the relationship 


ia 


Ax = tv, — 3tv2 + tv3 = 0, 


wea 0): 


which is a non-trivial linear combination equal to the zero vector. 


we have that 
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Activity 6.14 This is the method used to find the linear combination 
given in Example 6.5 for the vectors v1, V2, V3, 


1 2 4 
w=(2}. n= (iJ. w=(5). 
3 5 11 


Find the solution of avı + a2v2 + +a3v3 = 0 to obtain a linear 
dependence relation. 


Continuing with this line of thought, we know from our experience of 
solving linear systems with row operations that the system Ax = 0 will 
have precisely the one solution x = 0 if and only if we obtain from the 
n X k matrix A an echelon matrix in which there are k leading ones. 
That is, if and only if rank(A) = k. (Make sure you recall why this is 
true.) Thus, we have the following result: 


Theorem 6.15 Suppose v\,...,Vx% € R”. The set {v\,..., Vx} is lin- 
early independent if and only if the n x k matrix A = (V1 V2 --- Vx) 
has rank k. 


But the rank is always at most the number of rows, so we certainly need 
to have k < n. Also, there is a set of n linearly independent vectors in 
R”. In fact, there are infinitely many such sets, but an obvious one is 


{e],@2,...,en}, 


where e; is the vector with every entry equal to 0 except for the ith 
entry, which is 1. That is, 


1 0 0 
0 1 0 
e = : ’ OQ = : ’ EEY en = k 
0 0 1 


This set of vectors is known as the standard basis of R”. 


Activity 6.16 Show that the set of vectors 


{e],@2,...,en}, 


in R” is linearly independent. 


Therefore, we have established the following result: 


Theorem 6.17 The maximum size of a linearly independent set of 
vectors in R” isn. 
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So any set of more than n vectors in R” is linearly dependent. On the 
other hand, it should not be imagined that any set ofn or fewer is linearly 
independent; that is not true. 


Example 6.18 In Rt, which of the following sets of vectors are linearly 
independent? 


1 1 2 0 2 
0 2 1 0 5 
Li= =j ’ 9 ’ 3 ’ 1 , 9 ’ 
0 2 1 0 1 
1 1 1 1 2 
0 2 0 2 1 
Lı = —1 $ 9 ’ L3 = —] $ 9 $ 3 , 
0 2 0 2 1 
1 1 2 0 
0 2 1 0 
L4 = zj ł 9 ł 3 $ 1 
0 2 1 0 


Try this yourself before reading the answers. 

The set L; is linearly dependent because it consists of five vectors 
in R4. The set L, is linearly independent because neither vector is a 
scalar multiple of the other. To see that the set L3 is linearly dependent, 
write the vectors as the columns of a matrix A and reduce A to echelon 
form to find that the rank of A is 2. This means that there is a non-trivial 
linear combination of the vectors which is equal to 0, or, equivalently, 
that one of the vectors is a linear combination of the other two. The 
last set, L4, contains the set L3 and is therefore also linearly dependent, 
since it is still true that one of the vectors is a linear combination of the 
others. 


Activity 6.19 For the set L3 above, find the solution of the correspond- 
ing homogeneous system Ax = 0, where A is the matrix whose columns 
are the vectors of L3. Use the solution to write down a non-trivial linear 
combination of the vectors that is equal to the zero vector. Express one 
of the vectors as a linear combination of the other two. 


6.1.3 Linear independence and span 


As we have just seen in Example 6.18, if we have a linearly dependent 
set of vectors, and if we add to the set another vector, then the set is 
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still linearly dependent, because it is still true that at least one of the 
vectors is a linear combination of the others. This is true whether or 
not the vector we add is a linear combination of the vectors already in 
the set. 

On the other hand, if we have a linearly independent set of vectors, 
and if we add another vector to the set, then the new set may or may not 
be linearly independent, depending on the vector we add to the set. The 
following is a very useful result which tells us that if we have a linearly 
independent set of vectors and add to the set a vector which is not a 
linear combination of those vectors, then the new set is still linearly 
independent. (Clearly, if we were to add to the set a vector which is a 
linear combination of the vectors in the set, then the new set would be 
linearly dependent by Theorem 6.6.) 


Theorem 6.20 /f S = {v1, V2, ..., Vg} is a linearly independent set of 
vectors in a vector space V and if w € V is not in the linear span 


of S, w ¢ Lin(S), then the set of vectors {V,, V2, ..., Vx, W} is linearly 
independent. 
Proof. To show that the set {v,, V2, ..., Vz, w} is linearly independent, 


we need to show that the vector equation 

QV, + a2V2 +--- + aV + bw = 0 
has only the trivial solution. If b 4 0, then we can solve the vector 
equation for w and hence express w as a linear combination of the 
vectors in S, which would contradict the assumption that w ¢ Lin(S). 
Therefore, we must have b = 0. But that leaves the expression 


QV, + 42V2 + +- + ary, = 0, 


and since S is linearly independent, all the coefficients a; must be 0. 
Hence the set {v1, V2, ..., Vz, W} is linearly independent. 


Now suppose we have a set of vectors S = {v1, Vo, ..., Vx} which spans 
a vector space V, so V = Lin(S). If the set of vectors is linearly inde- 
pendent, and if we remove a vector, say v;, from the set, then the smaller 
set of k — 1 vectors cannot span V, because v; (which belongs to V) is 
not a linear combination of the remaining vectors. On the other hand, 
if the set is linearly dependent, then some vector, say v;, is a linear 
combination of the others; we can safely remove it and the set of k — 1 
vectors will still span V. 
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6.1.4 Linear independence and span in R” 


If S = {v}, Vo,..., Vg} is a set of vectors in R”, then we have seen 
that the questions of whether or not the set S spans R” or is linearly 
independent can be answered by looking at the matrix A whose columns 
are the vectors V1, V2,..., Vk. 

The set S spans IR” if we can show that the system of linear equations 
Ax = v has a solution for all v € R”; that is, if the system of equations 
Ax = v is consistent for all v € R”. We looked at this in Section 4.2 
(page 135): ifthe n x k matrix A has rank n, then S will span R”. So 
we must have k > n. 

In Section 6.1.2 (Theorem 6.12), we saw that the set S = 
{V1, V2,..-, Vx} is linearly independent if and only if the system of 
equations Ax = 0 has a unique solution, namely the trivial solution, so 
if and only if the matrix A has rank k. Therefore, we must have k < n. 

So to do both — to span R” and to be linearly independent — the 
set S must have precisely n vectors. If we have a set of n vectors 
{V1, V2,..., Vn} in R”, then the matrix A whose columns are the vectors 
Vi, V2,---, Vn is a square n x n matrix. Therefore, to decide if they span 
R” or if they are linearly independent, we only need to evaluate |4]. 


Example 6.21 The set of vectors {v,, V2, w}, where 


(OO 


is linearly independent. The vector equation avı + a2V2 +a3w = 0 
has only the trivial solution, since 


12 4 
|AJ=|2 1 5/=3040. 
3 5 1 


To emphasise how the properties of linear independence and span work 
together in R”, we will prove the following result, which shows explic- 
itly that a linearly independent set of n vectors in R” also spans R”. 


Theorem 6.22 Jf Vv), V2, ..., Vn are linearly independent vectors in R”, 
then for any x in IR", x can be written as a unique linear combination 
Of Vi,- Vn- 


Proof. Because v1, ..., V, are linearly independent, the n x n matrix 


A=(V V2 ... Vn) 
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has rank(A) = n. (In other words, A reduces to the n x n identity 
matrix.) By Theorem 4.5, the system Az = x has a unique solution for 
any x € R”. But let’s spell it out. Since there is a leading one in every 
row of the reduced echelon form of A, we can find a solution to Az = x, 
so any vector x can be expressed in the form 


Qa 
a2 
X= ÁZ= (V1 V2... Va| . |, 
Qn 
where we have written z as (a, @2,..., @,)'. Expanding this matrix 


product, we have that any x € R” can be expressed as a linear 
combination 


X = QV, + O2V2 + +++ +QyVn,; 


as required. This linear combination is unique since the vectors are 
linearly independent (or, because there is a leading one in every column 
of the echelon matrix, so there are no free variables). 


It follows from this theorem that if we have a set of n linearly indepen- 
dent vectors in IR”, then the set of vectors also spans R”. So any vector 
in R” can be expressed in exactly one way as a linear combination of 
the n vectors. We say that the n vectors form a basis of R”. This is the 
subject of the next section. 


6.2 Bases 


Consider a set of vectors {v1, V2, ..., Vg} in a vector space V. We have 
seen two concepts associated with this set. If the set {v1, V2, ..., Vx} 
spans V, then any vector x € V can be expressed as a linear combina- 
tion of the vectors v1, V2,..., Vx. Ifthe set {v1, Vo,..., vx} is linearly 
independent and if a vector x € V can be expressed as a linear combi- 
nation of the vectors v1, V2, ..., Vz, then this expression is unique. 

If a set of vectors {v1}, V2, ..., Vx} has both properties — if it spans 
V and it is linearly independent — then every vector v € V can be 
expressed as a unique linear combination of v1, V2, ..., Vx. This gives 
us the important concept of a basis. 


Definition 6.23 (Basis) Let V be a vector space. Then the subset B = 
{V1, V2,---,Vn} of V is said to be a basis for (or of) V if: 


(1) B isa linearly independent set of vectors, and 
(2) B spans V; that is, V = Lin(B). 
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An alternative characterisation of a basis can be given. The set B = 
{V1, V2,.--, Vn} is a basis of V if every vector in V can be expressed 
in exactly one way as a linear combination of the vectors in B. The set 
B spans V if and only if each vector in V is a linear combination of 
the vectors in B; and B is linearly independent if and only if any linear 
combination of vectors in B is unique. We have therefore shown: 


Theorem 6.24 B = {v1, V2,..., Vn} is a basis of V if and only if any 
v € V is a unique linear combination of V1, V2, . .. , Vn. 


Example 6.25 The vector space R” has the basis {e;, e2, . . . , €, } where 
e; is (as earlier) the vector with every entry equal to 0 except for the 
ith entry, which is 1. It is clear that the vectors are linearly independent 
(as you showed in Activity 6.16 on page 177), and it is easy to see that 


they span the whole of R”, since for any x = (x1, x2,...,x,)' € R”, 
X = xe; + X262 +---+x,e,. That is, 
X1 1 0 0 
X2 0 1 0 
K=] . [5a]: tx]: teta]: 
X 0 0 1 
The basis {e), e2, . . . , €„} is the standard basis of R”. 


Example 6.26 We will find a basis of the subspace of R? given by 


fal 


If x = (x, y, z)" is any vector in W, then its components must satisfy 
y = —x + 3z, and we can express x as 


= ()=(abs)=e($)€) 


=xv+zw (x,zeER). 


rty-3z=0). 


This shows that the set {v, w} spans W. The set is also linearly inde- 
pendent. Why? Because of the positions of the zeros and ones, if 
av + Bw = 0, then, necessarily, a = 0 and 6 = 0. 


Example 6.27 The set 
1 1 
saa) 


is a basis of R?. We can show this either using Theorem 6.22, or by 
showing that it spans R? and is linearly independent, or, equivalently, 
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that any vector b € R? is a unique linear combination of these two 
vectors. We will do the latter. Writing the vectors as the columns of a 
matrix A, we find that |4| 4 0, so this is true by Theorem 4.5. 


As in the above example, we can show that n vectors in R” are a 
basis of IR” by writing them as the columns of a matrix A and invoking 
Theorem 4.5. Turning this around, we can see that if A = (v1 V2 ... Vn) 
is ann x n matrix with rank(A) = n, then the columns of A are a basis 
of R”. Indeed, by Theorem 4.5, the system Az = x will have a unique 
solution for any x € R”, so any vector x € R” can be written as a unique 
linear combination of the column vectors. We therefore have two more 
equivalent statements to add to Theorem 4.5, resulting in the following 
extended version of that result: 


Theorem 6.28 /f A is ann x n matrix, then the following statements 
are equivalent: 


e Aq! exists. 

© Ax = b has a unique solution for any b € R”. 
e Ax = Q has only the trivial solution, x = 0. 

e The reduced echelon form of A is I. 

e |A| 40. 

e The rank of A is n. 

e The column vectors of A are a basis of R”. 

«+ The rows of A (written as vectors) are a basis of R”. 


The last statement can be seen from the facts that |A'| = |A| and 
the rows of A are the columns of AT. This theorem provides an easy 
way to determine if a set of n vectors is a basis of R”. We sim- 
ply write the n vectors as the columns of a matrix and evaluate its 
determinant. 


Example 6.29 The vector space Lin{v;, v2, w}, where 


1 2 4 
vi={(2], w=t1l),w=t5 
3 5 1 


is R?, and the set of vectors {v,, V2, w} isa basis. Why? In Example 6.21, 
we showed that the set {v,, v2, w} is linearly independent, and since 
it contains three vectors in R3, it is a basis of R°. (In fact, we 
showed that |4| 4 0, where A is the matrix whose column vectors are 
V1, V2, W.) 
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What about the vector space U = Lin{vj, v2, v3}, where v1, V2, V3 are 
the following vectors from Example 6.5? 


1 2 4 
3 5 11 


This set of vectors is linearly dependent since v3 = 2v; + v2, so we 
know that v3 € Lin{v;, v2}. Therefore, Lin{v,, v2} = Lin{v1, vo, v3}. 
Furthermore, {v1, V2} is linearly independent since neither vector is a 
scalar multiple of the other, so this space is the linear span of two 
linearly independent vectors in R? and is therefore a plane. The set 
{vı, V2} is a basis of U. A parametric equation of this plane is given 


by 
x 1 2 
<= (>J=m+m=s(2) eefi) S,te 
Z 3 5 


and we could find a Cartesian equation by eliminating the variables s 
and ¢ from the component equations. But there is a much simpler way. 
The vector x belongs to U if and only if x can be expressed as a linear 
combination of vı and v2, as in the equation above; that is, if and only 
if x, v1, V2 are linearly dependent. This will be the case if and only if 
we have 


a 


12 x 
3 5 z 


Expanding this determinant by column 3, we obtain 
[Al = 7x y — 320; 
This is the equation of the plane. 


Activity 6.30 Carry out the calculation of the determinant. Then verify 
that 7x + y — 3z = 0 is the equation of the plane by checking that the 
vectors V1, V2, V3 each satisfy this equation. 


Another way to look at a basis is as a smallest spanning set of vectors. 


Theorem 6.31 Jf V is a vector space, then a smallest spanning set is a 
basis of V. 


Proof. Suppose we have a set of vectors S and we know that S is a 
smallest spanning set for V,so V = Lin(S). If S is linearly independent, 
then it is a basis. So can S be linearly dependent? If S is linearly 
dependent, then there is a vector v € S which is a linear combination 
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of the other vectors in S. But this means that we can remove v from 
S, and the remaining smaller set of vectors will still span V since any 
linear combination of the vectors in S will also be a linear combination 
of this smaller set. But we assumed that S was a smallest spanning set, 
so this is not possible. S must be linearly independent and therefore S 
is a basis of V. 


6.3 Coordinates 


What is the importance of a basis? If S = {v1, V2, ..., Vn} is a basis of 
a vector space V, then any vector v € V can be expressed uniquely as 
V = QV) +Q2V7 +-:-+a,V,. The real numbers a1, @2,---,a@, are 
the coordinates of v with respect to the basis, S. 


Definition 6.32 (Coordinates) If S = {v1, V2,..., Vn} is a basis of 
a vector space V and v=a)v; + &2V2 +---+QyV,, then the real 
numbers 1, @,---,@, are the coordinates of v with respect to the 
basis, S. We use the notation 


to denote the coordinate vector of v in the basis S. 


Example 6.33 The sets B = {e;, e2} and S = {v1, v2}, where 


(2). ()} me s=f()- 2] 


are each a basis of R*. The coordinates of the vector v = (2, —5)" in 
each basis are given by the coordinate vectors, 


ve [3], mt m5] 


In the standard basis, the coordinates of v are precisely the components 
of the vector v, so we just write the standard coordinates as 


m 


In the basis S, the components of v arise from the observation that 


(DEGE) e f, 
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Activity 6.34 For the example above, sketch the vector v on graph 
paper and show it as the sum of the vectors given by each of the linear 
combinations: v = 2e; — 5e. and v = — lv; + 3v2. 


Activity 6.35 In Example 6.26, we showed that a basis of the plane W, 


“iC 


is given by the set of vectors B = {v, w}, where v = (1, —1, 0)! and 
w = (0, 3, 1)'. Show that the vector y = (5, 1, 2)! belongs to W and 
find its coordinates in the basis B. 


tyntenoh, 


6.4 Dimension 
6.4.1 Definition of dimension 


A fundamental fact concerning vector spaces is that if a vector space 
V has a finite basis, meaning a basis consisting of a finite number 
of vectors, then all bases of V contain precisely the same number of 
vectors. 

In order to prove this, we first need to establish the following result. 


Theorem 6.36 Let V be a vector space with a basis 
B= {vj,V2,.--,Vn} 


ofn vectors. Then any set ofn + 1 vectors is linearly dependent. 


This fact is easily established for R”, since it is a direct consequence 
of Theorem 6.17 on page 177. But we will show directly that any set 
of n + 1 vectors in R” is linearly dependent, because the proof will 
indicate to us how to prove the theorem for any vector space V. 

If S = {w1, W2, ..., Wr4i} C R”, andif Aisthen x (n + 1) matrix 
whose columns are the vectors w1, W2,...,W,+41, then the homoge- 
neous system of equations Ax = 0 will have infinitely many solutions. 
Indeed, since the reduced row echelon form of A can have at most n 
leading ones, there will always be a free variable, and hence infinitely 
many solutions. Therefore, the set S ofn + 1 vectors is linearly depen- 
dent. Using these ideas, we can now prove the theorem in general for 
any vector space V with a basis of n vectors. 
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Proof: Let S = {w1, W2,..., Wn+1} be any set of n + 1 vectors in V. 
Then each of the vectors w; can be expressed as a unique linear combi- 
nation of the vectors in the basis B. Let 


Wi = 41 iV1 + 42,;V2 ++ +> + an iVn. 


Now consider any linear combination of the vectors W1, W2, ..., Wn+1 
such that 


byw, + b2W2 +--+ + bp Wr4i = O. 
Substituting for the vectors w; in the linear combination, we obtain 
bi(ai 1V1 + .2,1V2 + +++ + Gn 1Vn) +e 
+ dng (Qi n41V1 + @2n41V2 + +++ + GnntiVn) = 0 


Now comes the tricky bit. We rewrite the linear combination by collect- 
ing all the terms which multiply each of the vectors v;. We have 
(biai + bray +++ + dng Gi n4i)V1 +: 
+ (bian, T bzan2 ariek Dida ee Vn = 0. 

But since the set B = {v1, V2,..., Vn} is a basis, all the coefficients 
must be equal to 0. This gives us a homogeneous system of 7 linear 
equations in the n + 1 unknowns b1, b2, ..., bn+1, 

biai + b2đa1,2 +++ + bagi @i ny = 

byaz1 + b2a22 + +++ + bn4142,n41 = 0 


bian, + b2an.2 a at bn+1đn,n+1 = 0, 


which must therefore have a non-trivial solution. So there are constants 
by, bo, ..., bn 41, not all zero, such that 


byw, + b2W2 +--+ + bai Wry = O. 


This proves that the set of vectors S = {w1, Wo, ..., Wn+1} is linearly 
dependent. 


Using this result, it is now a simple matter to prove the following 
theorem, which states that all bases of a vector space with a finite basis 
are the same size; that is, they have the same number of vectors. 


Theorem 6.37 Suppose that a vector space V has a finite basis con- 
sisting ofr vectors. Then any basis of V consists of exactly r vectors. 
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Proof. Suppose V has a basis B = {v1, V2, ..., V,-} consisting of r 
vectors and a basis S = {w,, Wo,..., Ws} consisting of s vectors. By 
Theorem 6.36, we must have s <r since B is a basis, and so any set 
ofr + 1 vectors would be linearly dependent and therefore not a basis. 
In the same way, since S is a basis, any set of s + 1 vectors would be 
linearly dependent, sor < s . Therefore, r = s. 


This enables us to define exactly what we mean by the dimension of a 
vector space V. 


Definition 6.38 (Dimension) The number k of vectors in a finite basis 
of a vector space V is the dimension of V, and is denoted dim(V). The 
vector space V = {0} is defined to have dimension 0. 


A vector space which has a finite basis — that is, a basis consisting of 
a finite number of vectors — is said to be finite-dimensional. Not all 
vector spaces are finite-dimensional. If a vector space does not have 
a basis consisting of a finite number of vectors, then it is said to be 
infinite-dimensional. 


Example 6.39 We already know R” has a basis of size n; for example, 
the standard basis consists of n vectors. So R” has dimension n. (This 
is reassuring, since it is often referred to as n-dimensional Euclidean 
space.) 


Example 6.40 A plane in R? is a two-dimensional subspace. It can be 
expressed as the linear span of a set of two linearly independent vectors. 
A line in R” is a one-dimensional subspace. A hyperplane in R” is an 
(n — 1)-dimensional subspace of R”. 


Example 6.41 The vector space F of real functions with pointwise 
addition and scalar multiplication (see Example 5.5) has no finite basis. 
It is an infinite-dimensional vector space. The set S of real-valued 
sequences (of Example 5.8) is also an infinite-dimensional vector 
space. 


If we know the dimension of a finite-dimensional vector space V, then 
we know how many vectors we need for a basis. If we have the correct 
number of vectors for a basis and we know either that the vectors span 
V or that they are linearly independent, then we can conclude that both 
must be true and they form a basis. This is shown in the following 
theorem. That is, if we know the dimension is k and we have a set of k 
vectors, then we do not need to show both. We only need to show either 
that the set is linearly independent or that it spans V. 
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Theorem 6.42 Let V be a finite-dimensional vector space of dimension 
k. Then: 


e kis the largest size of a linearly independent set of vectors in V. 
Furthermore, any set of k linearly independent vectors is necessarily 
a basis of V; 

e kis the smallest size of a spanning set of vectors for V. Furthermore, 
any set of k vectors that spans V is necessarily a basis. 


Thus, k = dim(V) is the largest possible size of a linearly independent 
set of vectors in V, and the smallest possible size of a spanning set 
of vectors (a set of vectors whose linear span is V). We have already 
proved part of this theorem for IR” as Theorem 6.22. 


Proof. If V has dimension k, then every basis of V contains precisely 
k vectors. Now suppose we have any set S = {w1, W2,..., Wx} of k 
linearly independent vectors. If S does not span V, then there must be 
some vector v € V which cannot be expressed as a linear combination 
of Wi, W2,..., Wz, and if we add this vector v to the set, then by 
Theorem 6.20 the set of vectors {W,, W2,..., Wx, V} would still be 
linearly independent. But we have already shown in Theorem 6.36 that 
k is the maximum size of a linearly independent set of vectors; any set 
ofk + 1 vectors is linearly dependent. Therefore, such a vector v cannot 
exist. The set S must span V, and so S is a basis of V. 

To prove the next part of this theorem, suppose we have any set 
S = {W1, W2,..., Wx} of k vectors which spans V. If the set is linearly 
dependent, then one of the vectors can be expressed as a linear combi- 
nation of the others. In this case, we could remove it from the set and 
we would have a set of k — 1 vectors which still spans V. This would 
imply the existence of a basis of V with at most k — 1 vectors, since 
either the new set of k — 1 vectors is linearly independent, or we could 
repeat the process until we arrive at some subset which both spans and 
is linearly independent. But every basis of V has precisely k vectors, by 
Theorem 6.37. Therefore, the set S must be linearly independent, and 
S is a basis of V. This argument also shows that it is not possible for 
fewer than k vectors to span S. 


Example 6.43 We know (from Example 6.26) that the plane W in R?, 


ri 


has dimension 2, because we found a basis for it consisting of two 
vectors. If we choose any set of two linearly independent vectors 


r+y-3=0}, 
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in W, then that set will be a basis of W. For example, the vectors 
vı = (1, 2, 1)! and v = (3, 0, 1)! are linearly independent (why?), 
so by the Theorem 6.42, S = {vj, V2} is a basis of W. 


6.4.2 Dimension and bases of subspaces 


Suppose that W is a subspace of the finite-dimensional vector space 
V. Any set of linearly independent vectors in W is also a linearly 
independent set in V. 


Activity 6.44 Prove this last statement. 


Now, the dimension of W is the largest size of a linearly independent 
set of vectors in W, so there is a set of dim(W) linearly independent 
vectors in V. But then this means that dim(W) < dim(V), since the 
largest possible size of a linearly independent set in V is dim(V). There 
is another important relationship between bases of W and V: this is that 
any basis of W can be extended to one of V. The following result states 
this precisely: 


Theorem 6.45 Suppose that V is a finite-dimensional vector space and 
that W is a subspace of V. Then dim(W) < dim(V). Furthermore, if 
{W1, W2,..., W,} is a basis of W, then there are s = dim(V) — dim(W) 
vectors V1, V2,...,Vs E€ V such that {W1, Wo, ..., Wr, Vi, V2, ---, Vs} is 
a basis of V. (In the case W = V, the basis of W is already a basis 
of V.) That is, we can obtain a basis of the whole space V by adding 
certain vectors of V to any basis of W. 


Proof: If {w1, W2,..., W} is a basis of W, then the set of vectors is a 
linearly independent set of vectors in V. If the set spans V, then it is a 
basis of V, and W = V. If not, there is a vector vı € V, which cannot 
be expressed as a linear combination of the vectors W1, W2,..., Wr. 
Then the set of vectors {W,, W2,..., W,, V1} is a linearly independent 
set of vectors in V by Theorem 6.20. Continuing in this way, we can 
find vectors v2,..., Vs E€ V until the linearly independent set of vectors 
{W1, W2,..., Wy, V1, V2,---, Vs} Spans V and is therefore a basis. This 
must occur when r + s = dim(V), so dim(W) < dim(V). 


Example 6.46 The plane W in R, 
W ={x|x+y—3z=0}, 


has a basis consisting of the vectors vı =(1, 2, 1)! and v= 
(3, 0, 1). 
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Let v3 be any vector which is not in this plane. For example, the vector 
v3 = (1, 0, 0)! is not in W, since its components do not satisfy the 
equation. Then the set S = {v1, V2, V3} is a basis of R*. Why? 


Activity 6.47 Answer this question. Why can you conclude that S is a 
basis of R?? 


6.5 Basis and dimension in R” 
6.5.1 Row space, column space and null space 


We have met three important subspaces associated with an m x n 
matrix A: 


e the row space is the linear span of the rows of the matrix (when they 
are written as vectors) and is a subspace of R”, 

e the null space is the set of all solutions of Ax = 0 and is also a 
subspace of R”, 

e the column space, or range of the matrix, is the linear span of the 
column vectors of A and is a subspace of R”. 


In Chapter 12, we will meet a fourth subspace associated with 4, namely 
N(A!), but these are, for now, the three main ones. 

In order to find a basis for each of these three spaces, we put the 
matrix A into reduced row echelon form. 

To understand how and why this works, we will first work carefully 
through an example. 


Example 6.48 Let 4 be the 4 x 5 matrix 


12112 
0 12 1 4 
a=) 13919 
0 120 1 


Then the row space of A, RS(A) is the linear span of the transposed 
rows: 


1 0 ~1 0 
2 1 3 1 
RS(A)=Ling}1],/2],] 9 |,}2 
1 1 1 0 
2 4 9 1 


The null space, N(A), is the set of all solutions of Ax = 0. 
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Whereas the row space and the null space of A are subspaces of R°, the 
column space is a subspace of Rt. Recall that the column space of a 
matrix, C'S(A), is the same as the range of the matrix, R(A). Although 
the original definitions of the range and the column space are different, 
we saw in Section 5.3.1 that they are precisely the same set of vectors, 
namely the linear span of the columns of A: 


1 2 


ly 1 
R(A) = CS(A) = Li ela 
oie ed | 9 [fa 


2 
4 
9 
0 1 


= W = 


2 0 


We put the matrix A into reduced row echelon form using elemen- 
tary row operations. Each one of these row operations involves replac- 
ing one row of the matrix with a linear combination of that row and 
another row: for example, our first step will be to replace row 3 with 
‘row 3 + row 1’. We will let R denote the matrix which is the reduced 
row echelon form of A. 


ioa 
01 2 0 1 
Am GG: a 


00 0 0 O 
Activity 6.49 Carry out the row reduction of A to obtain the matrix R. 


The row space of the matrix R is the linear span of the rows of R 
(written as vectors) and it is clear that a basis for this is given by the 
non-zero rows. Why is this? Each of these rows begins with a leading 
one, and since the rows below have zeros beneath the leading ones of 
the rows above, the set 


1 0 0 
0 1 0 
—3 |,|2|,1]0 
0 0 1 
—3 1 3 


is linearly independent and is therefore a basis of RS(R). 


Activity 6.50 Validate this argument: use the definition of linear inde- 
pendence to show that this set of vectors is linearly independent. 


But RS(R) = RS(A). Why? Each of the rows of R is a linear combina- 
tion of the rows of A, obtained by performing the row operations. There- 
fore, RS(R) C RS(A). But each of these row operations is reversible, 
so each of the rows of A can be obtained as a linear combination 
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of the rows of R, so RS(A) C RS(R). Therefore, RS(A) = RS(R) and 
we have found a basis for the row space of A. It is the set of non-zero 
rows (written as vectors) in the reduced row echelon form of A. These 
are the rows with a leading one. 

In this example, the row space of A is a three-dimensional subspace 
of R, where rank(A) = 3. 

To find a basis of the null space of A, we write down the general 
solution of Ax = 0, with x = (x1, x2, x3, X4, x5)". Looking at R, we see 
that the non-leading variables, corresponding to the columns without a 
leading one, are x3 and x5. If we set x3 = s and x5 = t, where s, t € R 
represent any real numbers, then the solution is 


xy 3s + 3t 3 3 

X2 —2s — t —2 —l 

x | = S =5 1 +t] 0 

x4 —3t 0 —3 

X5 t 0 1 
= sv, + tv2, s,teER. 


Activity 6.51 Check this solution. Check the steps indicated and check 
that Av; = 0. Av. = 0. 


The set of vectors {v1, V2} 1s a basis of the null space, N(A). Why? They 
span the null space since every solution of Ax = 0 can be expressed as a 
linear combination of vı, v2, and they are linearly independent because 
of the positions of the zeros and ones as the third and fifth components; 
the only linear combination sv, + tv2 which is equal to the zero vector 
is given by s = t = 0. This is not an accident. The assignment of the 
arbitrary parameters, s to x3 and ¢ to x5, ensures that the vector vı will 
have a ‘1’ as its third component and a ‘0’ as its fifth component, and 
vice versa for v2. 

In this example, the null space of A is a two-dimensional subspace 
of R°. Here 2 = n — r, where n = 5 is the number of columns of A and 
r = 3 is the rank of the matrix. Since A has r leading ones, there are 
n —r non-leading variables, and each of these determines one of the 
basis vectors for N(A). 

Finding a basis of the column space of A from the reduced row 
echelon form is not as obvious. The columns of R are very different 
from the columns of A. In particular, any column with a leading one 
has zeros elsewhere, so the columns with leading ones in the reduced 
row echelon form of any matrix are vectors of the standard basis of 
R”, In our example, these are e1, e2, e3 € R*, which are in columns 1, 
2 and 4 of R. But we can use this information to deduce which of the 
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column vectors of A are linearly independent, and by finding a linearly 
independent subset of the spanning set, we obtain a basis. 

Let c), C2, €3, C4, Cs denote the columns of A. Then, if we choose 
the columns corresponding to the leading ones in the reduced row 
echelon form of A, namely ¢1, €2, ¢4, these three vectors are linearly 
independent. Why? Because if we row reduce the matrix consisting of 
these three columns, 


G aei 100 
0 11 01 0 
eE n E E 
0 10 00 0 


the reduced row echelon form will have a leading one in every column. 
On the other hand, if we include either of the vectors c3 or c5, then this 
will no longer be true, and the set will be linearly dependent. 


Activity 6.52 Make sure you understand why this is true. 


Since {c;, €2, c4} is linearly independent, but {¢1, €2, ¢4, ¢3} is linearly 
dependent, we know that c3 can be expressed as a linear combination 
of c1, €2, c4. The same applies to ¢s, so 


CS(A) = Lin{ey, €2, €3, €4, es} = Lin{e;, €2, €4} 


and we have found a basis, namely {¢1, €2, €4}. 

So in our example, the range, or column space of A is a three- 
dimensional subspace of R4, where rank(A) = 3. Our basis consists of 
those columns of A which correspond to the columns of the reduced 
row echelon form with the leading ones. 


The same considerations can be applied to any m x n matrix A, as we 
shall see in the next section. 

If we are given k vectors v1, V2, ..., Vg in R” and we want to find a 
basis for the linear span V = Lin{vq, v2, . . . , Vg}, then we have a choice 
of how to do this using matrices. The point is that the k vectors might 
not form a linearly independent set (and hence they are not a basis). One 
method to obtain a basis for V is to write the spanning set of vectors 
as the rows of a k x n matrix and find a basis of the row space. In 
this case, we will obtain a simplified set of vectors for the basis (in the 
sense that there will be leading ones and zeros in the vectors), which 
are linear combinations of the original vectors. Alternatively, we can 
write the spanning set of vectors as the columns of an x k matrix and 
find a basis of the column space. This will consist of a subset of the 
original spanning set, namely those vectors in the spanning set which 
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correspond to the columns with the leading ones in the reduced echelon 
form. 


6.5.2 The rank—nullity theorem 


We have seen that the range and null space of an m x n matrix are 
subspaces of R” and R”, respectively (Section 5.2.4). Their dimensions 
are so important that they are given special names. 


Definition 6.53 (Rank and nullity) The rank of a matrix A is 
rank(A) = dim(R(A)) 


and the nullity is 
nullity(A) = dim(N(A)). 


We have, of course, already used the word ‘rank’, so it had better 
be the case that the usage just given coincides with the earlier one. 
Fortunately, it does. If you look again at how we obtained a basis of the 
column space, or range, in Example 6.48 in the previous section, you 
will see the correspondence between the basis vectors and the leading 
ones in the reduced row echelon form. This connection is the content 
of the following theorem. 


Theorem 6.54 Suppose that A is an m xn matrix with columns 
C1, €2,..., €n, and that the reduced row echelon form obtained from 
A has leading ones in columns i,, i2, ...,i-. Then a basis for R(A) is 


B= (Gia Cis., c; }. 


Note that the basis is formed from columns of 4, not columns of the 
echelon matrix: the basis consists of those columns of A corresponding 
to the leading ones in the reduced row echelon form. 


Proof. Any solution x = (a, @2,..., @,) of Ax = 0 gives a linear com- 
bination of the columns of A which is equal to the zero vector, 


0 = aye; are, +--+» + Agen. 


If R denotes the reduced echelon form of A, and if c}, c5, . . . , ¢), denote 
the columns of R, then exactly the same relationship holds: 


0 = œc tare, +- H Ane. 


In fact, we use R to obtain the solution x = (a, @,...,@,). So the 
linear dependence relations are the same for the columns of both 
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matrices. This means that the linearly independent columns of A cor- 
respond precisely to the linearly independent columns of R. Which 
columns of R are linearly independent? The columns which contain the 
leading ones. Why? (Think about this before continuing to read.) 

The form of the reduced row echelon matrix R is such that the 
columns with the leading ones are c; = e1, €; =€2, ...,€; = €, 
where e1, €o,..., €, are the first r vectors of the standard basis of R”. 
These vectors are linearly independent. Furthermore, since R has r 
leading ones, the matrix has precisely r non-zero rows, so any other 
column vector of R (corresponding to a column without a leading one) 
is of the form 


Oy 
j a 
C= =aje; +- + Are. 


0 
This gives a linear dependence relationship: 
aye; +H +e =c = 0. 


The same linear dependence relationship holds for the columns of A, 
so that 


a1 Cj, fees + aC; =e; = 0. 


This implies that the set B spans R(4). Since the only linear combina- 
tion of the vectors e1, e2, ..., e, which is equal to the zero vector is the 
one with all coefficients equal to zero, the same is true for the vectors 
Ci,, C;,,..., ¢;,. Therefore these vectors are linearly independent, and 
the set B is a basis of R(A). 


Although the row space and the column space of an m x n matrix may 
be subspaces of different Euclidean spaces, RS(A) C R” and CS(A) € 
R”, it turns out that these spaces have the same dimension. Try to see 
why this might be true by looking again at the example in the previous 
section, before reading the proof of the following theorem. 


Theorem 6.55 Jf A is anm x n matrix, then 
dim(RS(A)) = dim(C S(4)) = rank(A). 


Proof. By Theorem 6.54, the dimension of the column space, or range, 
is equal to the number of leading ones in the reduced row echelon form 
of A. 
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If R denotes the reduced row echelon form of A, then RS(A) = 
RS(R) and a basis of this space is given by the non-zero rows of R; 
that is, the rows with the leading ones. The reason this works is that: 
(1) row operations are such that, at any stage in the procedure, the row 
space of the reduced matrix is equal to the row space of the original 
matrix (since the rows of the reduced matrix are linear combinations of 
the original rows), and (ii) the non-zero rows of an echelon matrix are 
linearly independent (since each has a one in a position where the rows 
below it all have a zero). 

Therefore, the dimension of the row space is also equal to the 
number of leading ones in R; that is, 


dim(RS(A)) = dim(CS(A)) = dim(R(A)) = rank(A). 


Example 6.56 Let B be the matrix 


1 1 2 1 
a=(2 0 1 i). 
9 -1 3 4 


The reduced row echelon form of the matrix is (verify this!) 


1 0 
e=(0 1 ) 
0 0 0 


The leading ones in this echelon matrix are in the first and second 
columns, so a basis for R(B) can be obtained by taking the first and 
second columns of B. (Note: ‘columns of B’, not of the echelon matrix!) 
Therefore, a basis for R(B) is 


(3) (3) 


A basis of the row space of B consists of the two non-zero rows of the 
reduced matrix or, alternatively, the first two rows of the original matrix 
(written as vectors): 


ONINI = 
NIEN| 


1 0 1 2 
0 1 a 1 0 
1 |-| 3 , 

2 2 2 1 


Note that the column space is a two-dimensional subspace of R? (a 
plane) and the row space is a two-dimensional subspace of R*. The 
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columns of B and E satisfy the same linear dependence relations, 
which can be easily read from the reduced echelon form of the matrix, 
1 3 1 1 


C3 = oh. sb 72> C4 = rhe + ze 


Activity 6.57 Check that the columns of B satisfy these same linear 
dependence relations. 


There is a very important relationship between the rank and nullity of 
a matrix, known as the rank-nullity theorem or dimension theorem for 
matrices. This theorem states that 


dim(R(A)) + dim(N(A)) = n, 


where n is the number of columns of the matrix A. 

We have already seen some indications of this result in our consid- 
erations of linear systems (Section 4.2). Recall that if an m x n matrix 
A has rank r, then the general solution to the system Ax = 0 involves 
n —r ‘free parameters’. Specifically, the general solution takes the form 


X = 81V, F S2V2 ++ ++ + Sn—rVn-r; Si E R, 


where v1, V2, ... , Vn- are themselves solutions of the system Ax = 0. 
But the set of solutions of Ax = 0 is precisely the null space N(A). 
Thus, the null space is spanned by the n — r vectors V1, V2, ..-, Vn—rs 
and so its dimension is at most n — r. In fact, it turns out that its 
dimension is precisely n — r. That is, 


nullity(A) = n — rank(A). 


To see this, we need to show that the vectors v1, V2, ..., Vn- are linearly 
independent. Because of the way in which these vectors arise (look at 
the examples we worked through), it will be the case that for each of 
them there is some position where that vector will have an entry equal 
to 1 and the entry in that same position for all the other vectors will 
be 0. From this we can see that no non-trivial linear combination of 
them can be the zero vector, so they are linearly independent. We have 
therefore proved the following central result: 


Theorem 6.58 (Rank-nullity theorem) For an m x n matrix A, 
rank(A) + nullity(A) = n. 


Activity 6.59 Find a basis of the null space of the matrix B in Exam- 
ple 6.56 and verify the rank—nullity theorem: 


1 1 2 1 
a=(2 0 1 i): 
9 -1 3 4 
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Use the null space basis vectors to obtain the linear dependence relations 
between the columns of B (which we found earlier using the columns 


of R). 


This is a good time to recall that the row space, RS(A), and the null 


space, N(A), of anm x n matrix A are orthogonal subspaces of 


R”, As 


we saw in Section 5.3.1, any vector in one of the spaces is orthogonal 
to any vector in the other. Therefore, the only vector in both spaces is 


the zero vector; that is, the intersection RS(A)M N(A) = {0}. 


Activity 6.60 Prove this last statement. 


6.6 Learning outcomes 


You should now be able to: 


explain what is meant by linear independence and linear dependence 
determine whether a given set of vectors is linearly independent or 
linearly dependent, and in the latter case, find a non-trivial linear 
combination of the vectors which equals the zero vector 

explain what is meant by a basis 

find a basis for a linear span 

find a basis for the null space, range and row space of a matrix from 
its reduced row echelon form 

explain what it means for a vector space to be finite-dimensional 
and what is meant by the dimension of a finite-dimensional vector 
space 

explain how rank and nullity are defined, and the relationship 
between them (the rank—nullity theorem). 


6.7 Comments on activities 


Activity 6.4 Since 2p — q = 0, the vectors are linearly dependent. 


Activity 6.11 As noted, 
QV, + 4V2 +---+4mVm = b1V1 + b2V2 + +--+ BmVin 
if and only if 
(a, — bi )vı + (a2 — b2)V2 +--+ + (am — Om)Vm = 0. 


But since the vectors are linearly independent, this can be true only if 
a, — bı = 0, a — b = 0, and so on. That is, for each i, we must have 


di = bi. 
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Activity 6.14 We need to find constants a; such that 


a ae 


This is equivalent to the matrix equation Ax = 0, where A is the matrix 
whose columns are these vectors. Row reducing A, we find 


12 4 1 0 2 
a=(2 1 5) (o 1 i) 
3 5 Il 0 0 0 


Setting the non-leading variable equal to źt, the general solution of 


Ax = Qis 
—2 
=(=); te 
1 


Taking ¢t = 1, then Ax = —2v, — v2 + v3 = 0, which is a linear depen- 
dence relation. 


ga 


Activity 6.16 Looking at the components of the vector equation 


1 0 0 0 

0 1 0 0 
dı $ +a i +--+ an : = . ’ 

0 0 1 0 


you can see that the positions of the ones and zeros in the vectors lead to 
the equations a; = 0 from the first component, a2 = 0 from the second 
component and so on, so that a; = 0 (1 <i <n) is the only possible 
solution and the vectors are linearly independent. (Alternatively, the 
matrix A = (e1, €2,..., €n) is the n x n identity matrix, so the only 
solution to Az = 0 is the trivial solution, proving that the vectors are 
linearly independent.) 


Activity 6.19 The general solution to the system is 


x —3/2 
<= (7) = (712), teR. 
Z 1 
Taking t = —1, for instance, and multiplying out the equation Ax = 0, 


we see that 


1 2 
3 2 1 
2ļ-1] 319 3 

2 1 
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and hence 
1 
2 
=] 2\|9 
2 


Activity 6.35 The vector y belongs to W since it satisfies the equation 
of the plane, (5) + (1) — 3(2) = 0. For the coordinates in the basis B, 
you need to solve the vector equation 


A 


for constants a and 6. Because of the positions of the zeros and ones 
in this basis, this can be done by inspection. From row | (equating the 
first components), we must have a = 5, and from row 3, we must have 
B = 2. Checking the middle row, 1 = 5(—1) + 2(3). Therefore, 


5 
wle=(3| 
Activity 6.44 If S = {w1, W2, ..., w,;} is a linearly independent set of 
vectors in W, then we can state that the only linear combination 
QW, +a2.W2+---+a4,w, = 0 


is the trivial one, with all a; = 0. But all the vectors in W are also in V, 
and this statement still holds true, so S is a linearly independent set of 
vectors in V. 


Activity 6.47 The set S is linearly independent since v3 ¢ Lin{v,, vo}, 
and it contains precisely 3 = dim(R?) vectors. 


Activity 6.50 Consider any linear combination of these vectors which 
is equal to the zero vector, 


1 0 0 0 
0 1 0 0 
a, | —3 | +a.|2]+a;|0]=]0 
0 0 1 0 
3 1 3 0 


Clearly, the only solution is the trivial one, aj = a2 = a3 = 0. 

This will happen for any set of non-zero rows from the reduced 
row echelon form of a matrix. Since the vectors arise as the non-zero 
rows of a matrix in reduced row echelon form, each vector contains 
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a leading one as it first non-zero component, and all the other vectors 
have zeros in those positions. So in this example the equations relating 
the first, second and fourth components (the positions of the leading 
ones) tell us that all coefficients are zero and the vectors are linearly 
independent. 


Activity 6.59 A general solution of the system of equations Bx = 0 is 


| 
NI=NI= 


0 = §;U; + S202. 
1 
The set {u;, u2} is a basis of the null space of B, so dim(N(B)) = 2. 
From the example, rank(B) = 2. The matrix B has n = 4 columns: 
rank(B) + nullity(B) = 2 +2 = 4 =n. 


The basis vectors of the null space give the same linear dependence 
relations between the column vectors as those given in the example. 
Since Au; = 0 and Aw) = 0, 


1 3 1 1 
Au, = =i — 5° +ce;=0 and Au = Eo — 7% +c4=0. 
Activity 6.60 Let A be an m x n matrix. Ifx € RS(A), then (x, v) = 0 
for all v € N(A), and if x € N(A), then (w, x) = 0 for all w € RS(A). 
Therefore, if x is in both, we must have (x, x) = ||x||? = 0. But this 
implies that x is the zero vector; that is, RS(A)M N(A) = {0}. 


6.8 Exercises 


Exercise 6.1 Show that the vectors x;, X2, x3 given below are linearly 
independent: 


(iQ G) 


Express the vector 


as a linear combination of them. 
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Exercise 6.2 Let 


-00 -0 


Find a vector x3 such that {x1, x2, x3} is a linearly independent set of 
vectors. 

Find a condition that a, b, c must satisfy for the set of vectors 
{x1, X2, V} to be linearly dependent. 


Exercise 6.3 Using the definition of linear independence, show that 
any non-empty subset of a linearly independent set of vectors is linearly 
independent. 


Exercise 6.4 Let S = {v1, V2,..., Vn} be a set ofn vectors in R” and let 
A be the matrix whose columns are the vectors v1, V2, ..., Vn. Explain 
why the set S is linearly independent if and only if |A| Æ 0. 


Exercise 6.5 Show that the following vectors are linearly dependent by 
finding a non-trivial linear combination of the vectors that equals the 
zero vector. 


1 0 4 9 
2 —1 —11 2 
1 ies Se 5 , 1 
2 4 —1 —3 


Exercise 6.6 Prove that if n > m, then any set of n vectors in R” is 
linearly dependent. 


Exercise 6.7 Let A be any matrix. Let vı and v2 be two non-zero 
vectors and suppose that Av; = 2v; and Av2 = 5v2. Prove that {v1, v2} 
is linearly independent. (Hint: Assume a;v, + æ2V2 = 0. Multiply this 
equation through by 4 to get a second equation for vı and v2. Then 
solve the two equations simultaneously.) 

Can you generalise this result? 


Exercise 6.8 Consider the sets 


ao l a Gh 


What subspace of R? is Lin(U)? Lin( W)? Find a basis for each subspace 
and show that one of them is a plane in R°. Find a Cartesian equation 
for the plane. 


Exercise 6.9 Write down a basis for the xz-plane in R3. 


204 Linear independence, bases and dimension 


Exercise 6.10 Let B be the set of vectors B = {v1, vo, v3}, where 
vı = (1, 1, 0)', v2 = (4, 0, 3)', v3 = (3, 5, 1)'. Show that B is a basis 
of R°. 
Let w = (—1,7, 5)! and e; = (1, 0, 0)'. Find the coordinates of w 
and e; with respect to the basis B. 


Exercise 6.11 Let V be a vector space with a basis B= 
{V1, V2,.--, Vn}. Show that for any u, w € V, 


[ou + Bw]z = alu] + p[w]s. 


Exercise 6.12 Consider the matrix 


ty 2 i 3 
sm 2 3 0 i) 
SH 45. 529: 3 


Find a basis of the row space of A, R S( 4), and the column space, C S(4). 
State why C S(4A) is a plane in R, and find a Cartesian equation of this 
plane. 

State the rank—nullity theorem (the dimension theorem for matri- 
ces), ensuring that you define each term. Use it to determine the dimen- 
sion of the null space, N(A). 

For what real values of a is the vector 


-1 
b(a) = | a ) 
a 


in the range of A? Write down any vectors in R(A) of this form. 


Exercise 6.13 A matrix A is said to have full column rank if and only 
if the columns of A are linearly independent. If A is an m x k matrix 
with full column rank, show that: 


(1) A'A isasymmetric k x k matrix, 
(2) A'A is invertible. 


1 -2 
Then verify the above results for the matrix M = [ 0 ; 
1 1 


Exercise 6.14 Let B be an m x k matrix whose row space, RS(B), is 
a plane in R? with Cartesian equation 4x — 5y + 3z = 0. 

From the given information, can you determine either m or k for 
the matrix B? If it is possible, do so. 

Can you determine the null space of B? Ifso, write down a general 
solution of Bx = 0. 
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Exercise 6.15 Let S be the vector space of all infinite sequences of real 
numbers. Let W be the subset which consists of sequences for which 


all entries beyond the third are zero. Show that W is a subspace of S of 
dimension 3. 


6.9 Problems 


Problem 6.1 Determine which of the following sets of vectors are 
linearly independent. 


n= eeh) 
O0 OGO 
(0.8.0) 


Problem 6.2 Which of the following sets of vectors in R* are linearly 
independent? 


1 2 1 2 1 
2 0 2 0 1 
S1 = riot 1 i S2 = Pelate : 
3 2 3 2 2 
1 2 4 
2 0 4 
s- 1 9 =] 9 1 l 
3 2 8 
1 2 4 1 
2 0 4 1 
OE WN ot eh eat cpt Seale 
3 2 8 2 


Problem 6.3 Show that the following set of vectors is linearly 
dependent 


1 2 4 1 3 
snee (0) C)-0-0-O) 
1 —l 1 1 0 


Write down a largest subset W of S which is a linearly independent set 
of vectors. Express any of the vectors which are in S but not in W asa 
linear combination of the vectors in W. 
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Problem 6.4 What does it mean to say that a set of vectors 


{V1, V2, .--; Vn} is linearly dependent? 
Given the following vectors 
1 1 4 5 
2. 1 5 5 
v = 0 » WwW 1]? V3 = 3l? v4 = 2f? 
—1 1 2 2 


show that {v1, V2, V3, V4} is linearly dependent. 

Is it possible to express v4 as a linear combination of the other 
vectors? If so, do this. If not, explain why not. What about the vector 
v3? 


Problem 6.5 For each of the sets S; of vectors given below, find a basis 
of the vector space Lin(S;) and state its dimension. Describe geometri- 
cally any sets Lin(S;) which are proper subspaces of a Euclidean space 
R”, giving Cartesian equations for any lines and planes. 


e a{(4).9-).2} 
OLOLAN 


2 4 
0 4 

S4 = Aabla 
3 2 8 


Problem 6.6 Which of the following sets are a basis for R?? (State 
reasons for your answers.) 


s s=10) Go} 
e eC mam igel au). 


Problem 6.7 Find the coordinates of the vector (1, 2, 1)! with respect 
to each of the following bases for R°: 


d Anoo 
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Problem 6.8 Find a basis for each of the following subspaces of R°. 


(a) The plane x — 2y + z = 0; 
(b) The yz-plane. 


Problem 6.9 Prove that the set 


2t 
n=4( ne | 
3t 
is a subspace of R°. 


Show that every vector w € H is a unique linear combination of 


the vectors 
1 0 
n= (0 | and w= (1). 
—1 5 


Is {v1, Vo} a basis of the subspace H? If yes, state why. If no, write 
down a basis of H. State the dimension of H. 

Let G be the subspace G = Lin{vj, v2}. Is {v,, V2} a basis of G? 
Why or why not? 

State a geometric description of each of the subspaces H and G. 
What is the relationship between them? 


Problem 6.10 Find the general solution of each of the following 
systems of equations in the form x = p + a8; +--+: + Æn—rSn-r where 
p is a particular solution of the system and {s;,...,8,—,} is a basis for 
the null space of the coefficient matrix. 


Ax = b: Bx = d;: 
xı + x2 + x3 + x4 = 4 xı +2x2 — x3 — X4 = 3 
2x, + x3 — x4 = 2 Xi — X2 — 2x3 — x4 = 1 
2x2 + x3 + 3x4 = 6 2x, +x. — x3 = 3. 


Find the set ofall b € R? such that Ax = b is consistent. 
Find the set of all d € R? such that Bx = d is consistent. 


Problem 6.11 Let 


an ae ae E 
aly 3 0 2 28 
AS a A 
1 2 5 13 5 


Find a basis for the range of A. Find a basis of the row space and the 
null space of A. Verify the rank—nullity theorem for the matrix A. 
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Problem 6.12 Find a basis of the row space, a basis of the range, and 
a basis of the null space of the matrix 


1 2 13 0 
s= (0 1 1 1 1). 
| See coe | L 


Find the rank of B and verify the rank—nullity theorem. 

Let b = cı + c5, the sum of the first and last column of the matrix 
B. Without solving the system, use the information you have obtained 
to write down a general solution of the system of equations Bx = b. 


Problem 6.13 Find the rank of the matrix 


1 0 1 
“=i 1 a 
0 =f -1 


Find a basis of the row space and a basis of the column space of A. 
Show that RS(A) and CS(A) are each planes in R*. Find Cartesian 
equations for these planes and hence show that they are two different 
subspaces. 

Find the null space of A and verify the rank—nullity theorem. Show 
that the basis vectors of the null space are orthogonal to the basis vectors 
of the row space of A. 

Without solving the equations, determine if the systems of equations 
Ax = b; and Ax = b; are consistent, where 


1 2 
n= (1) and n= (1); 
2 3 


If the system is consistent, then find the general solution. If possible, 
express each of b; and bp as a linear combination of the columns of A. 


Problem 6.14 A portion of the matrix 4 and the reduced row echelon 
form of A are shown below: 


1 4 x x 1 0 —I 5 
a=(2 —] x ; es (o 1 3 2). 
3 2 * x 00 0 0 


Find a basis of the row space of A, RS(A), a basis of the range of A, 
R(A), and a basis of the null space of A, N(A). 

Let b = (9,0, a)! for some a € R. The matrix equation Ax = b 
represents how many equations in how many unknowns? Find the 
value of a for which the system of equations Ax = b is consistent. 

Find, if possible, the missing columns of A. 
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Problem 6.15 Let B be a3 x 4 matrix whose null space is 
1 


A 


N(B)=4x=t : te 


4 
Determine the rank of B. Find the range of B, R(B). 
Consider the row space of B, RS(B). Show that the vector vı = 
(4, 0,0, —1)' is in RS(B). Extend {v;} to a basis of RS(B), and justify 
that your set of vectors is a basis. 


7 


Linear transformations and 
change of basis 


We now turn our attention to special types of functions between vector 
spaces known as linear transformations. We will look at the matrix 
representations of linear transformations between Euclidean vector 
spaces, and discuss the concept of similarity of matrices. These ideas 
will then be employed to investigate change of basis and change of coor- 
dinates. This material provides the fundamental theoretical underpin- 
ning for the technique of diagonalisation, which has many applications, 
as we shall see later. 


7.1 Linear transformations 


A function from one vector space V to a vector space W is a rule 
which assigns to every vector v € V a unique vector w € W. If this 
function between vector spaces is linear, then it is known as a linear 
transformation, (or linear mapping or linear function). 


Definition 7.1 (Linear transformation) Suppose that V and W are 
vector spaces. A function T : V — W is linear if for all u, v € V and 
alla e R: 


1. T(u+v)= T(u)+ T(v), and 
2. T(au)=aT(u). 


A linear transformation is a linear function between vector spaces. 


A linear transformation of a vector space V to itself, T : V —> V is 
often known as a linear operator. 
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If T is linear, then for all u, v € V anda, B € R, 


T(au+ v) = aT (u) + T(V). 


This single condition implies the two in the definition, and is implied 
by them. 


Activity 7.2 Prove that this single condition is equivalent to the two of 
the definition. 


So a linear transformation maps linear combinations of vectors to the 
same linear combinations of the image vectors. In this sense, it preserves 
the ‘linearity’ of a vector space. 

In particular, if T : V —> W, then for 0 € V, 7(0) =0 € W. That 
is, a linear transformation from V to W maps the zero vector in V to the 
zero vector in W. This can be seen in a number of ways. For instance, 
take any x € V. Then 7(0) = T(0Ox) = O7(x) = 0. 


7.1.1 Examples 


Example 7.3 To get an idea of what a linear mapping might look like, 
let us look first at R. What mappings F : R — R are linear? 

The function F\(x) = px for any p € R is a linear transformation, 
since for any x, y € R, a, B € R, we have 


Fi(ax + By) = plax + By) = a(px) + B(py) = aF\(x) + BFi(y). 


But neither of the functions Fy(x) = px + q, (for p,q € R, gq #0) or 
F3(x) = x? is linear. 


Activity 7.4 Show this. Use a specific example of real numbers to show 
that neither of these functions satisfies the property 


a 


T(x +y)=T(x)+ T(y)forall x,y € 


Example 7.5 Suppose that A is an m x n matrix. Let T be the function 
given by T(x) = Ax for x € R”. That is, T is simply multiplication 
by A. Then T is a linear transformation, T : R” —> R”. This is easily 
checked, as follows: first, 


T(u + v) = A(u + v) = Au + Av = T (u) + T(v), 
and, second, 


T(au) = A(au) = g Au = aT (u). 
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So the two ‘linearity’ conditions are satisfied. We call T the linear 
transformation corresponding to A, and sometimes denote it by T4 to 
identify it as such. 


Example 7.6 (More complicated) Let us take V = R” and take W 
to be the vector space of all functions f : R —> R (with pointwise 
addition and scalar multiplication). Define a function T : R” > W as 
follows: 


uy 
uz 

T(u) =f : = Puyur,....un = Pus 
Un 


where Pu = Puy,w,....u, 1S the polynomial function given by 


Then 7 is a linear transformation. To check this, we need to verify that 
T(u+v)=T(u)+T(v) and TJ(au)=aT(u). 
Now, T (u + v) = Pu+y, T (U) = Pu, and T(v) = py, so we need to check 
that Pusy = Pu + py. This is in fact true, since, for all x, 
Putv(X) = Pujtvy,.ctin td, 
= (uy + ue tee + (Un + Vn)” 
= (ux +e + Unx”) + (vix +++ + Ux") 
= Pu(x) + pyx) 
= (Pu + py)(x). 


The fact that for all x, Puyv(x) = (Pu + py)(x) means that the func- 
tions Pury and py + py are identical. The proof that T(au) = «œT (u) is 
similar. 


Activity 7.7 Do this. Prove that T(au) = œa T (u). 


7.1.2 Linear transformations and matrices 


In this section, we consider only linear transformations from R” to R” 
for some m and n. But much of what we say can be extended to linear 
transformations mapping from any finite-dimensional vector space to 
any other finite-dimensional vector space. 
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We have seen that any m x n matrix A defines a linear transformation 
T : R” — R” given by T(v) = AV. 

There is a reverse connection: for every linear transformation 
T : R” —> R”, there is a matrix A such that 7(v) = Av. In this context, 
we sometimes denote the matrix by Ar in order to identify it as the 
matrix corresponding to T. (Note that in the expression Ar, T refers 
to a linear transformation. This should not be confused with A‘, the 
transpose of the matrix A.) 


Theorem 7.8 Suppose that T : R” —> R” is a linear transformation. 
Let {e, €2,..., €n} denote the standard basis of R” and let A be the 
matrix whose columns are the vectors T (e1), T(e2), ... , T(€,): that is, 


A = (T(e1) T(e2) ... T(en)). 


Then, for every x € IR", T(x) = Ax. 


Proof. Let x = (x1, X2, ..., Xn)! be any vector in R”. Then 
x1 1 0 0 
X2 0 1 0 
l =x |. [x2]. |Het]. 
Xp 0 0 1 


= XO] + X22 + `+- + Xen. 
Then by the linearity properties of T we have 


T(x) = Ties + eer + +s + Xren) 
= T (xe) + T(x2e2) + --- + T&nen) 
= xı T (e1) + x2T (e2) pa + xn T (en). 


But this is just a linear combination of the columns of 4, so we have 
(by Theorem 1.38), 


T(x) = (T(e1) T (e2) ... T(e,))x = AX, 


exactly as we wanted. 


Thus, to each matrix A there corresponds a linear transformation T4, 
and to each linear transformation T there corresponds a matrix Ar. 
Note that the matrix A we found was determined by using the standard 
basis in both vector spaces; later in this chapter we will generalise this 
to use other bases. 


214 Linear transformations and change of basis 


Example 7.9 Let T : R? — R? be the linear transformation given by 


x x+y+Z 
Z x +2y— 3z 


We can find the image of a vector, say u=(1,2,3)', by sub- 
stituting its components into the definition, so that, for example, 
T(u) = (6, —1, —4)'. 

To find the matrix of this linear transformation, we need the images 
of the standard basis vectors. We have 


1 1 1 
room(s): saiel), saie(h) 
1 2 —3 
The matrix representing T is A = (T (e1) T(e2) T(e3)), which is 
1 1 1 
A= (i -1 0 ) ; 
1 2 -3 


Notice that the entries of the matrix A are just the coefficients of x, y, z 
in the definition of T. 


Activity 7.10 Calculate the matrix product Au for the vector u = 
(1, 2,3)" and the matrix A above, and observe that this has exactly 
the same effect as substituting the components. 


7.1.3 Linear transformations on R? 


Linear transformations from R? to R? have the advantage that we can 
‘observe’ them as mappings from one copy of the Cartesian plane to 
another. For example, we can visualise a reflection in the x axis, which 
is given by 


with matrix 


ct ene 


We know that linear transformations preserve linear combinations, and 
we can interpret this geometrically by saying that lines are mapped 
to lines and parallelograms are mapped to parallelograms. Because 


Figure 7.1 A 
rotation 
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(a, c) 


we know that any linear transformation T : R? —> R? corresponds to 
multiplication by a matrix A, we can describe the effects of these on the 
plane. As another example, consider the linear transformation 


re=(9 3)(3). 


This has the effect of stretching the plane away from the origin by a 
factor of 2 in the x direction and by a factor of 3 in the y direction. If 
we look at the effect of this linear transformation on the parallelogram 
whose sides are the vectors e; and ez (a unit square), we find that the 
image is a parallelogram (a rectangle) whose corresponding sides are 
2e, and 3e. (In this sense, the linear transformation can be described 
as an ‘enlargement’.) 

What about a rotation? If we ‘rotate’ the plane about the origin 
anticlockwise by an angle 0, the unit square with sides e; and e will be 
rotated. To find the matrix A which represents this linear transformation, 
we need to find the images of the standard basis vectors e; and e2. Let 


rey=(4),  Te=(4), 
a. 


We want to determine the coordinates a, c and b, d. It is helpful to draw 
a diagram of R? such as Figure 7.1, with the images T (e1) and T (e2) 


after rotation anticlockwise by an angle 0, 0 < 0 < 5. 
The vectors 


T(e,) = (£) and T(e2)= a 


are orthogonal and each has length 1 since they are the rotated standard 
basis vectors. We drop a perpendicular from the point (a,c) to the 


so that 
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x axis, forming a right-angled triangle with angle 6 at the origin. Since 
the x coordinate of the rotated vector is a and the y coordinate is c, the 
side opposite the angle 6 has length c and the side adjacent to the angle 
0 has length a. The hypotenuse of this triangle (which is the rotated 
unit vector e;) has length equal to 1. We therefore have a = cos 0 and 
c = sin 0. Similarly, we can drop the perpendicular from the point (b, d) 
to the x axis and observe that the angle opposite the x axis is equal to 8. 
Again, basic trigonometry tells us that the x coordinate is b = — sin 0 
(it has length sin 0 and is in the negative x direction), and the height is 
d = cos 0. Therefore, 


T e poe 
~e dj] \sin@ cos 
is the matrix of rotation anticlockwise by an angle 0. Although we have 


shown this using an angle 0 < 0 < F, the argument can be extended to 
any angle 0. 


Example 7.11 If 6 = 7, then rotation anticlockwise by 7 radians is 
given by the matrix 


x n 1 1 
B= Ge Ped = & 2) 
sinf  cosĵ ASR 
Activity 7.12 Confirm this by sketching the vectors e; and e2 and the 
image vectors 
al alt 
T(e,) = (=F and T(e2)= ( i 
v2 v2 
What is the matrix of the linear transformation which is a rotation anti- 
clockwise by z radians? What is the matrix of the linear transformation 
which is a reflection in the line y = x? Think about what each of these 
two transformations does to the standard basis vectors e, and e2 and 
find these matrices. 


7.1.4 Identity and zero linear transformations 


If V is a vector space, we can define a linear transformation T : V > V 
by T(v) = v, called the identity linear transformation. 

If V = R”, the matrix of this linear transformation is J, then x n 
identity matrix. 

There is also a linear transformation T : V —> W defined by T (v) = 
0, mapping every vector in V to the zero vector in W. 
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If V = R” and W = R”, the matrix of this linear transformation is 
an m x n matrix consisting entirely of zeros. 


7.1.5 Composition and combinations of linear 
transformations 


The composition of linear transformations is again a linear transforma- 
tion. IfT : V —> W and S : W — U, then ST is the linear transforma- 
tion given by 


ST(v) = S(T(v)) = S(w) = u, 


where w = T (v). Note that ST means do T and then do S; that is, 


v5w4Xu. (For ST, work from the inside, out.) 
If 7 : R” — R” and S: R” — R’, then in terms of matrices, 


ST(v) = S(T(v)) = S(Arv) = As Arv. 


That is, Asr = Ás Ar; the matrix of the composition is obtained by 
matrix multiplication of the matrices of the linear transformations. The 
order is important. Composition of linear transformations, like multi- 
plication of matrices, is not commutative. 


Activity 7.13 What are the sizes of the matrices As and Ar? Show 
that the sizes of these matrices indicate in what order they should be 
multiplied (and therefore in what order the composition of the linear 
transformations is written). 


A linear combination of linear transformations is again a linear transfor- 
mation. If S, T : V —> W are linear transformations between the same 
vector spaces, then S + T and aS, a € R, are linear transformations, 
and therefore so is ~S + BT for any a, $ € R. 


Activity 7.14 If you have any doubts about why any of the linear 
transformations mentioned in this section are linear transformations, 
try to prove that they are by showing the linearity conditions. 


For example, the composition ST is a linear transformation because 


ST(ax + By) = S(T(ax + By)) 
= S(aT(x) + BT(y)) 
= aS(T(x)) + BS(T(y)) 
= aST(x) + BST(y). 
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7.1.6 Inverse linear transformations 


If V and W are finite-dimensional vector spaces of the same dimension, 
then the inverse of a linear transformation T : V —> W is the linear 
transformation 7~! : W —> V such that 


T~\(T(v)) =v. 


In R”, if T7! exists, then its matrix satisfies 


T~\(T(v)) = Ap Arv = Iv. 
That is, T7! exists if and only if (A7)~! exists, and (Ar)! = Ar-. 


Activity 7.15 What result about inverse matrices is being used here in 
order to make this conclusion? 


Example 7.16 In R?, the inverse of rotation anticlockwise by an angle 6 
is rotation clockwise by the same angle. Thinking of clockwise rotation 
by 0 as anticlockwise rotation by an angle —0, the matrix of rotation 
clockwise by 8 is given by, 


_ (cos(—@) —sin(—6)\ _ / cos@ sino 
a= Or cos(—0) ) 7 fa = j 


This is easily checked: 


Jaadre cos@ sinô cos@é —sin0\ /1 0 
ri “T= \ —sin@ cosé sn@ cosd/ \0O 17’ 
Activity 7.17 Check this by multiplying the matrices. 


Example 7.18 Is there an inverse to the linear transformation in 


Example 7.9, 
x x+y+Z 
A x +2y —3z 


1 1 1 
(1 -1 0). 
1 2 <3 


Since |A| = 9, the matrix is invertible, and TT! is given by the matrix 


We found 
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That is, 
1 5 1 
u er d oe 
oe = su — Zu+ iw 
w lyu—1 —2 
3u — 5UT—gw 


Activity 7.19 Check that T~'(w) for w = T(u) = (6, —1, —4)" is 
u = (1, 2, 3)! (see Example 7.9). Also check that T~'T = I. 


7.1.7 Linear transformations from V to W 


Theorem 7.20 Let V be a finite-dimensional vector space and let T 
be a linear transformation from V to a vector space W. Then T is 
completely determined by what it does to a basis of V. 


Proof. Let dim(V) = n, and let B = {v1, Vo,..., Vn} be a basis of V. 
Then any v € V can be uniquely expressed as a linear combination of 
these basis vectors: V = a1V1 + d2V2 + -© + AnVn. 

Then 


T(v) = T (avı + a2V2 + +++ + anVn) 
=a T (v1) + aT (v2) + +++ + anT (vn). 
Thatis, ifv € V is expressed as a linear combination ofthe basis vectors, 
then the image T (v) is the same linear combination of the images of the 


basis vectors. Therefore, if we know T on the basis vectors, we know it 
forallve V. 


If both V and W are finite-dimensional vector spaces, then this result 
allows us to find a matrix which corresponds to the linear transformation 
T. The matrix will depend on the basis of V and the basis of W. 
Suppose V has dim(V) = n and basis B = {v), v2, ..., Vn}, and that W 
has dim(W) = m and basis S = {w,, W2, ..., Wm}. Then the coordinate 
vector of a vector v € V is denoted by [v],, and the coordinate vector of 
a vector T(v) € W is denoted [7(v)]s. By working with the coordinates 
of these vectors (rather than the vectors themselves), we can find a 
matrix such that [T (v)]s = A[v]z. 
Using the result above we can write, 


[T(v)]s = aT) ]s + a2[T(v2)]s +--+ + an[T(Vn)]s 
= ([T(vÐ]s [TOYd]s --- [Tn] s Iv] az, 


where [v]s = (a1, @2,°-+, dn yT is the coordinate matrix of the vector 
v in the basis B, and [T (v;)]s are the coordinates of the image vectors 
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in the basis S (see Exercise 6.11). That is, A is the matrix whose 
columns are the images 7(v;) expressed in the basis of W. Then we 
have [T(v)]s = A[v]z. 


7.2 Range and null space 
7.2.1 Definitions of range and null space 


Just as we have the range and null space of a matrix, so we have the 
range and null space of a linear transformation, defined as follows: 


Definition 7.21 (Range and null space) Suppose that 7 is a linear 
transformation from a vector space V to a vector space W. Then the 
range, R(T), of T is 


R(T) = {T(v) |ve V}, 
and the null space, N(T), of T is 
N(T)={ve V | Ty) = 9}, 
where 0 denotes the zero vector of W. 


The null space is also called the kernel, and may be denoted ker(T) in 
some texts. 

The range and null space of a linear transformation 7: V > W 
are subspaces of W and V, respectively. 


Activity 7.22 Prove this. Try this yourself before looking at the answer 
to this activity. 


Of course, for any m x n matrix, A, if T is the linear transformation 
T(x) = Ax, then R(T) = R(A) and N(T) = N(A). The definitions of 
the subspaces are the same since T : R” — R” and for all x € R”, if 
T(x) = Ax, we have: 


R(T) = {T(x) | x € R"} = {Ax | x € R”} = R(A) c R”, 


and 


N(T) = {x € R"| TŒ = 0} = {x ER" | 4x = 0} = MA) CR". 
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Example 7.23 We find the null space and range of the linear transfor- 
mation S : R? —> R4 given by 


x+y 
IE 
y x—y 
y 
The matrix of the linear transformation is 
1 1 
1 0 
ed (ee 
0 1 


Observe that this matrix has rank 2 (by having two linearly independent 
columns, or you could alternatively see this by putting it into row 
echelon form), so that N(S) = {0}, the subspace of R? consisting of 
only the zero vector. This can also be seen directly from the fact that 


x+y 0 

i = K Sata A 
x—y 0 y 0 
y 0 


The range, R(S), is the two-dimensional subspace of R4 with basis 
given by the column vectors of As. 


7.2.2 Rank—nullity theorem for linear transformations 


If V and W are both finite-dimensional, then so are R(T) and N(T), 
and we refer to their dimensions as the rank and nullity of the linear 
transformation, respectively. 


Definition 7.24 (Rank and nullity of a linear transformation) The 
rank of a linear transformation T is 


rank(T) = dim(R(T)) 
and the nullity is 
nullity(7) = dim(N(7)). 


As for matrices, there is a strong link between these two dimensions 
known as the Rank—nullity theorem or the Dimension theorem for linear 
transformations. Here we are concerned with subspaces of any vector 
spaces V and W (not just Euclidean spaces). 
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Theorem 7.25 (Rank-nullity theorem linear transformations) 
Suppose that T is a linear transformation from the finite-dimensional 
vector space V to the vector space W. Then 

rank(T) + nullity(7) = dim(V). 
(Note that this result holds even if W is not finite-dimensional.) 


Proof: Assume that dim(V) = n and nullity(7) = k. We need to show 
that rank(7) = n — k. Let {v1, Vo, ..., Vx} be a basis of the null space, 
N(T). As N(T) is a subspace of V, we can extend this basis to a basis 
of V, {V1, V2,---5 Vk, Veti, +--+» Vn} (by Theorem 6.45). For any v € V, 
we have v = a\Vj + doV2 +:-:+4,V,. Then, 


T(v) = ay T (v1) + +++ + eT (Ve) + aki T (Wes) + +++ + anT (Vn) 
= a)0+---+a,0 + akı T(Vk+1) pe +anT(vn) 
= aky T (Ves) +: +: + anT (Vn), 
since T(v;)=0 for i=1,...,k (because v; € N(A)). Hence the 
vectors {T (Vz+1), +-+, T(Vn)} span the range, R(T). If they are a basis 
of R(T), then rank(T) = n — k. So it only remains to show that they 
are linearly independent. 
If there is a linear combination of the vectors equal to the zero 
vector, 
bryr T (Vri) + +++ + baT (Vn) = T (bk1 Vki + +++ + OnVn) = O, 
then the vector bg+1Vz+1 + -+ © + bnVn is in the null space of T, and can 
be written as a linear combination of the basis vectors of N(T), 
bDk41Vk+1 qraf bin = bivi T a Dux. 


Rearranging, we have 


bivi + + DEVE — Dk41 Vki — +++ — OnVn = 9. 
But {v1, V2,..-, Vk, Vk+1; <- <, Vn} is a basis of V, and hence all coeffi- 
cients b; must be 0. This shows that {T (v.41), ---, T(vn)} are linearly 


independent and the theorem is proved. 


For an m x n matrix A, if T(x) = Ax, then T is a linear transforma- 
tion from V = R” to W = R”, and we have rank(7) = rank(A) and 
nullity(7) = nullity(A). So this theorem is the same as the earlier result 
that 


rank(A) + nullity(A) = n. 


Here n is the dimension of R” = V (which, of course, is the same as 
the number of columns of A). 
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Example 7.26 Is it possible to construct a linear transformation 
T : R? > R? with 


1 
N(T) = l: (2) :te e}. R(T) = xy-plane? 
3 


A linear transformation T : R? —> R? must satisfy the rank-nullity 
theorem with n = 3: 


nullity(T) + rank(T) = 3. 


Since the dimension of the null space of T is 1 and the dimension of 
R(T) is 2, the rank-nullity theorem is satisfied, so at this stage, we 
certainly can’t rule out the possibility that such a linear transformation 
exists. (Of course, if it was not satisfied, we would know straight away 
that we couldn’t have a linear transformation of the type suggested.) 

To find a linear transformation T with N(T) and R(T) as above, 
we construct a matrix Ar, which must be 3 x 3 since T : R? > R?. 
Note that if R(Ar) = R(T) is the xy-plane, then the column vectors 
of Ar must be linearly dependent and include a basis for this plane. 
You can take any two linearly independent vectors in the xy-plane to 
be the first two columns of the matrix, and the third column must be 
a linear combination of the first two. The linear dependency condition 
they must satisfy is revealed by the basis of the null space. 

For example, we may take the first two column vectors to be the 
standard basis vectors, cı = e1, and c2 = e2. Then if v is the null space 
basis vector, we must have Arv = 0. This means 


1 
Arv=(e; & c) (2) = lc +20 +36 =0. 
3 
Therefore, we must have c3 = -i ci — Zez. So one possible linear 


transformation satisfying these conditions is given by the matrix 
1 
1 0 a 
Ar=|0 1 -2]. 
0 0 0 


7.3 Coordinate change 


In this section, we shall limit our discussion to R” for some n, but 
much of what we say can be extended to any finite-dimensional vector 
space V. 
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Suppose that the vectors v1, V2,..., Vn forma basis B for R”. Then, as 
we have seen, any x € R” can be written in exactly one way as a linear 
combination, 


X = QV] + Q2V2 + +++ +QnVn, 
of the vectors in the basis, and the vector 


a 


is called the coordinate vector of x with respect to the basis B = 
{V1, V2, ..-, Vn}. 

One very straightforward observation is that the coordinate vector 
of any x € R” with respect to the standard basis is just x itself. This is 
because if x = (x1, X2, . - - , Xn)”, then 


X = XQ) +H X262 +--+ + Xren. 


What is less immediately obvious is how to find the coordinates of a 
vector x with respect to a basis other than the standard one. 


7.3.1 Change of coordinates from standard to basis B 


To find the coordinates of a vector with respect to a basis B = 
{V1, V2, .. -, Vn}, we need to solve the system of linear equations 


QV, + 42V2 + +: F 4nVn = X, 
which, in matrix form, is 
x= (Vi V2... V,)a 


with a = (a1, a2, . . . , an)! = [x]p. In other words, if we let P be the 
matrix whose columns are the basis vectors (in order), 


P = (Vi V2 ... Vn), 


then for any x € R”, 


x = P[x]p. 
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The matrix P is invertible because its columns form a basis. So we can 
also write 


[x]; = Po'x. 


Definition 7.27 (Transition matrix) If B = {v,, vo,..., Vn} is a basis 
of R”, the matrix 


P=(V V2... Vn), 


whose columns are the basis vectors in B, is called the transition matrix 
from B coordinates to standard coordinates. Then, the matrix P7! is 
the transition matrix from standard coordinates to coordinates in the 
basis B. 


In order to emphasise the connection of a transition matrix P with the 
corresponding basis B, we will sometimes denote the matrix by Pg. 


Example 7.28 Let B be the following set of vectors of R3: 


EO) 


To show that B is a basis, we can write the vectors as the columns of a 


matrix P, 
1 2 3 
P= 2 -=l 2). 
-l1 4 1 


then evaluate the determinant. We have |P| = 4 Æ 0 so B is a basis of 
R3, 
If we are given the B coordinates of a vector v, say 


then we can find its standard coordinates either directly as a linear 
combination of the basis vectors, 


(8-0-2) 
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or by using the matrix P, 


1 2 3\fT4 —9 
OE =4 2) 4 -(-3). 
-1 4 1/ l-5], —5 


which, of course, amounts to the same thing. 
To find the B coordinates of a vector x, say x = (5,7, —3)', we 
need to find constants a), a2, a3 such that 


GC- -0 


We can do this either using Gaussian elimination to solve the system 
Pa = x for a = (aj, a2, a3)! or by using the inverse matrix, P~!, to 


find 
1 
ala = P'r= |] ‘ 
2 B 


We can check the result as follows: 


1 2 3 5 
=1( 2 |+en(-1)+2(2] - | a 
—1 4 1 —3 
Activity 7.29 Check all the calculations in this example. Find P~! and 


use it to find [x]. 


Activity 7.30 Continuing with this example, what are the B coordinates 
of the basis vectors 


1 2 3 
a=) w= (=). w= (2): 
—1 4 1 


7.3.2 Change of basis as a linear transformation 


If P is the transition matrix from coordinates in a basis B of R” to stan- 
dard coordinates, then considered as the matrix of a linear transforma- 
tion, T(x) = Px, the linear transformation actually maps the standard 
basis vectors, e;, to the new basis vectors, v;. That is, T(e;) = v;. 


Example 7.31 Suppose we wish to change basis in R? by a rotation of 
the axes mn radians anticlockwise. What are the coordinates of a vector 
with respect to this new basis, B = {v1, v2}? 
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The matrix of the linear transformation which performs this rotation is 
given by 
(ee a) 2 (4 7) 
sin cos> Z z i 
and the column vectors of the matrix are the new basis vectors, vı = 
T (ei) and v2 = T (e2), since these are the images of the standard basis 
vectors. So the matrix is also the transition matrix from B coordinates 


to standard coordinates: 
1 al 
sa 
af y2 


and we have v = P[v]g. Then the coordinates of a vector with respect 
to the new basis are given by [v] = P~'v. The inverse of rotation 
anticlockwise is rotation clockwise, so we have, 


pa A sin( ae tee a) 
~ \sin(—%) cos(—7)/ \—sin} cos4 
1 
O A2 A2 


Suppose we want the new coordinates of a vector, say x = (1, 1)". Then 


we have 
1 
z = 353\/l /2 
marx (4 #) (= [4] 
v2 N2 B 


so that 


7.3.3 Change of coordinates from basis B to basis B’ 


Given a basis B of R” with transition matrix Pg, and another basis B’ 
with transition matrix Pg, how do we change from coordinates in the 
basis B to coordinates in the basis B’? 

The answer is quite simple. First, we change from B coordinates to 
standard coordinates using v = Ps[v], and then change from standard 
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coordinates to B’ coordinates using [v] 3, = Priv. That is, 
[vls = Ps Ps[v]s. 


The matrix M = P% Pz is the transition matrix from B coordinates to 
B’ coordinates. 

In practice, the easiest way to obtain the matrix M is as the product 
of the two transition matrices, M = Po! Px. But let’s look more closely 
at the matrix M. Ifthe basis B is the set of vectors B = {v1, V2,..., Vn}, 
then these are the columns of the transition matrix, Pg = (V1 V2 ... Vn). 
Looking closely at the columns of the product matrix, 


M= Py Pp Pe VV Va) = (Pg Va Py Ve Pa, 


that is, each column of the matrix M is obtained by multiplying the 
matrix P3,! by the corresponding column of Pg. But P} !v; is just the 
B’ coordinates of the vector v;, so the matrix M is given by 


M = ([vi]B [v2] --- [Va]. 


We have therefore established the following result: 


Theorem 7.32 If B and B' are two bases of R”, with 
B= {v1,V2,.--, Vn}, 


then the transition matrix from B coordinates to B’ coordinates is given 
by 


M = ([vi] a [vz] --- [Vn] a"). 


Activity 7.33 The above proof used the following fact about matrix 
multiplication. If A is an m x n matrix and B is ann x p matrix with 
column vectors bı, b2,..., bp, then the product AB is them x p matrix 
whose columns are Ab;, Abz,..., Ab,; that is, 


AB = A(b, bz ... bp) = (Ab; Ab ... Abp). 
Why is this correct? 


Example 7.34 Each of the sets of vectors 


P=) (a) 8=4C)-G)} 


is a basis of R?, since if 
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then their determinants, |P| = 3 and |Q| = 1, are non-zero. P is the 
transition matrix from B coordinates to standard and Q is the transition 
matrix from S coordinates to standard. 

Suppose you are given a vector x € R? with 


w= 


How do you find the coordinates of x in the basis S$? There are two 
approaches you can take. 
First, you can find the standard coordinates of x, 


= 46)-G)=G T)(4)=G) 


and then find the S coordinates using Q7! 


wore (3 DO] 


Alternatively, you can calculate the transition matrix M from B coor- 
dinates to S coordinates. Using v = P[v]g and v = Q[v]s, we have 
[vls = Q~' Plv]z, so 


pgp a ra es rr 
p=(5 4) (4) =[i6 |, 


Note that the columns of M are the S coordinates of the basis B vectors 
(which would be another way to find M). 


Activity 7.35 Check the calculations in this example. In particular, 
check the S coordinates of the vector x, and check that the columns of 
M are the basis B vectors in S coordinates. 


7.4 Change of basis and similarity 


7.4.1 Change of basis and linear transformations 


We have already seen that if T is a linear transformation from R” to 
R”, then there is a corresponding matrix A such that T(x) = Ax for all 
x. The matrix A is given by 


A = (T(e1) T(e2) ... T(€n)). 
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This matrix is obtained using the standard basis in both R” and in R”. 
Now suppose that B is a basis of IR” and B’ a basis of R”, and suppose 
we want to know the coordinates [7(x)],, of T(x) with respect to B’, 
given the coordinates [x], of x with respect to B. Is there a matrix M 
such that 


[TŒ]; = Mix], 
for all x? Indeed there is, as the following result shows. 


Theorem 7.36 Suppose that B = {v\,...,Vn} and B' = {v\,...,Vi,} 
are bases of R” and R” and that T : R” — R” is a linear transforma- 
tion. Let M = Ajp,p be them x n matrix with the ith column equal to 
[7 (v;)]g, the coordinate vector of T(v;) with respect to the basis B'. 
Then for all x, [T(x)]p, = M[x]z. 


The matrix M = A;g 27 is the matrix which represents T with respect 
to the bases B and B’. 


Proof: In order to prove this theorem, let’s look at the stages of transition 
which occur from changing basis from B to standard, performing the 
linear transformation in standard coordinates and then changing to the 
basis B’. 

Let A be the matrix representing T in standard coordinates, and 
let Pg and Pp be, respectively, the transition matrix from B coordi- 
nates to standard coordinates in IR” and the transition matrix from B’ 
coordinates to standard coordinates in R”. (So Pg is an n x n matrix 
having the basis vectors of B as columns, and Pg is an m x m matrix 
having the basis vectors of B’ as columns.) Then we know that for any 
x € R”, x = Pg[x]g; and, similarly, for any u € R”, u = Pz [u]z, so 
[u]; = Pu. 

We want to find a matrix M such that [T (x)]; = M[x]z. If we 
start with a vector x in B coordinates, then x = Pg[x]z will give us the 
standard coordinates. We can then perform the linear transformation on 
x using the matrix 4, 


T(x) = Ax = AP3[x]z, 


giving us the image vector 7(x) in standard coordinates in R”. To 
obtain the B’ coordinates of this vector, all we have to do is multiply on 
the left by the matrix P,,'; that is, 


[T(x] p= Pg TQ). 


Then substituting what we found for T(x) in standard coordinates, 


[TŒ]; = Pp A Pax]. 


7.4 Change of basis and similarity 231 


Since this is true for any x € R”, we conclude that 


M = P3APz 


is the matrix of the linear transformation in the new bases. 

This, in fact, is the easiest way to calculate the matrix M. But 
let’s take a closer look at the columns of M = P% APp. We have 
Pg = (Vi V2 ... Vn), SO 


APg = A(Vı V2 ... Vn) =(4Vvı Avo ... Avy). 
But Av; = T(v;), so 
APg = (Tvi) T(v2) ... Tva). 
Then M = P’ AP}, so 


M = Pà (T0) T(v2) ... T(v,)) 
= (P3 (T0) P3 TOD) -P3 (TOV; 


that is, M = ([T(vD]e [T(v2)]e..-[T(vn)] a) and the theorem is 
proved. 


Thus, if we change the basis from the standard bases of R” and R”, the 
matrix representation of the linear transformation T changes. 


7.4.2 Similarity 


A particular case of this Theorem 7.36 is so important it is worth stating 
separately. It corresponds to the case in which m = n and B’ = B. 


Theorem 7.37 Suppose that T : R" — R” is a linear transformation 
and that B = {x,, X2,...,Xn} is a basis of R”. Let A be the matrix 
corresponding to T in standard coordinates, so that T(x) = Ax. Let 


P=(X, X2 ... Xn) 


be the matrix whose columns are the vectors of B. Then for all x € R”, 


[TŒ]; = P-'APIx]a. 


In other words, A;g,3] = P-'AP is the matrix representing T in the 
basis B. The relationship between the matrices A;g g] and A is a central 
one in the theory of linear algebra. The matrix A;g 3] performs the 
same linear transformation as the matrix A, only A;g, 3) describes it in 
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terms of the basis B rather than in standard coordinates. This likeness 
of effect inspires the following definition. 


Definition 7.38 (Similarity) We say that the square matrix C is similar 
to the matrix A if there is an invertible matrix P such that C = P~' AP. 


Note that ‘similar’ has a very precise meaning here: it doesn’t mean 
that the matrices somehow ‘look like’ each other (as normal use of the 
word similar would suggest), but that they represent the same linear 
transformation in different bases. 

Similarity defines an equivalence relation on matrices. Recall that an 
equivalence relation satisfies three properties; it is reflexive, symmetric 
and transitive (see Section 3.1.2). For similarity, this means that: 


e amiatrix A is similar to itself (reflexive), 
e if C is similar to A, then A is similar to C (symmetric), and 
e if Dis similar to C, and C to A, then D is similar to A (transitive). 


Activity 7.39 Prove these! (Note that we have purposely not used the 
letter B here to denote a matrix, since we used it in the previous discus- 
sion to denote a set of n vectors which form a basis of R”.) 


Because the relationship is symmetric, we usually just say that A and 
C are similar matrices, meaning one is similar to the other, and we 
can express this either as C = PT!AP or A = Q7'CQ for invertible 
matrices P and Q (in which case Q = P™'). 

As we shall see in subsequent chapters, this relationship can be used 
to great advantage if the new basis B is chosen carefully. 

Let’s look at some examples to see why we might want to change 
basis from standard coordinates to another basis of R”. 


Example 7.40 You may have seen graphs of conic sections, and you 
may know, for example, that the set of points (x, y) € R? such that 
x* + y? = 1 is a circle of radius one centered at the origin, and that, 
similarly, the set of points x? + 4y? = 4 is an ellipse. 

These equations are said to be in standard form and as such they 
are easy to sketch. For example, to sketch a graph of the ellipse in R?, 
all you need to do is note that if x = 0, then y = +1, and if y = 0, 
then x = +2; mark these four points on a set of coordinates axes and 
connect them in an ellipse (see Figure 7.2). 

Suppose, however, we want to graph the set of points (x, y) € R? 
which satisfy the equation 5x? + 5y? — 6xy = 2. It turns out that this, 
too, is an ellipse, but this is not obvious, and sketching the graph is 
far from easy. So suppose we are told to do the following: change the 


Figure 7.2 The 
ellipse 
x? +4? =4 
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\ 


basis in R? by rotating the axes by 7 radians anticlockwise, express the 
equation in the new coordinates and then sketch it. 
Let’s see what happens. The linear transformation which accom- 
plishes this rotation has matrix 
1 1 
a a a a 
V2 V2 
where the columns of P are the new basis vectors. We’ll call the new 


basis B and denote the new coordinates of a vector v by X and Y; more 
precisely, [v]z = (X, Y)". Then, 


1 1 1 1 


into the equation 5x? + 5y? — 6xy = 2 and collect terms. The result in 
the new coordinates is the equation 


xX? 4 4y? =1. 


Activity 7.41 Carry out the substitution for x and y to obtain the new 
equation. 


So how do we sketch this? The new coordinate axes are obtained from 
the standard ones by rotating the plane by 7 radians anticlockwise. 
So we first sketch these new X, Y axes and then sketch ¥? + 4Y? = 1 
on the new axes as described above (by marking out the points on the 
X axis where Y = 0, and the points on the Y axis where X = 0 and 
connecting them in an ellipse). See Figure 7.3. 


We will look at these ideas again later in Chapter | 1. Now let’s look at 
a different kind of example. 
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Example 7.42 Suppose we are given the linear transformation 


T:R > R4, 
y y 


and we are asked to describe the effect of this linear transformation 
on the xy-plane. At this point there isn’t much we can say (other than 
perhaps sketch a unit square and see what happens to it). So suppose 
we are told to change the basis in R? to a new basis 


B= ={(1) 0) 


and then to find the matrix of the linear transformation in this basis. 
Call this matrix C. We have just seen that C = P~!AP, where A is the 
matrix of the linear transformation in standard coordinates and P is the 
transition matrix from B coordinates to standard: 


1 3 i 3 
4=(_; 5): p=(; 1): 
Then 


ce a med wre) Cea ay) 


Activity 7.43 Check this calculation by multiplying the matrices. 


So what does this tell us? The B coordinates of the B basis vectors are 
1 0 
wile=|o] and w= f9] 
B B 


so in B coordinates the linear transformation can be described as a 
stretch in the direction of vı by a factor of 4 and a stretch in the 


Figure 7.3 The 
ellipse 
5x*+5y?—6xy =2 
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direction of v2 by a factor of 2: 


role=(9 3) lol, = lol, 40a 


and, similarly, [7(v2)]z = 2[v2]z. But the effect of T is the same no 
matter what basis is being used to describe it; it is only the matrices 
which change. So this statement must be true even in standard coordi- 
nates; that is, we must have 


Av; =4v, and Av, = 2v3. 


Activity 7.44 Check this. Show that dv; = 4v, and Av = 2v3. 


In each of these examples, we were told the basis to use in R? in order to 
solve the questions we posed. So you should now be asking a question: 
“How did we know which basis would work for each of these examples?’ 
We shall begin to discover the answer in the next chapter. 


7.5 Learning outcomes 


You should now be able to: 


e explain what is meant by a linear transformation and be able to prove 
a given mapping is linear 

e explain what is meant by the range and null space, and rank and 
nullity of a linear transformation 

e explain the rank—nullity theorem (the dimension theorem) for linear 
transformations and be able to apply it 

e explain the two-way relationship between matrices and linear trans- 
formations 

e find the matrix representation of a transformation with respect to 
two given bases 

e change between different bases of a vector space 

e explain what it means to say that two square matrices are similar. 


7.6 Comments on activities 


Activity 7.2 To show the condition is equivalent to the other two, we 
need to prove two things: first, that the two conditions imply this one 
and, second, that this single condition implies the other two. So suppose 
the two conditions of the definition hold and suppose that u, v € V 
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and a, B € R. Then we have T(au) = a7 (u) and T(fv) = BT (Vv), by 
property 2, and, by property 1, we then have 


T(au + By) = T(au) + T(Bv) = aT) + T(V), 


as required. On the other hand, suppose that for all u, v € V anda, £ € 
R, we have T(au+ Sv) = aT(u) + BT(v). Then property 1 follows 
on taking a = 6 = | and property 2 follows on taking 8 = 0. 


Activity 7.4 You just need to use one specific example to show this. For 
example, let x = 3 and y = 4. Then 


FL{3+4)=TIpt+q # Mh3)+hA=Gpt+q)+4ptgq) 
= 7p+2q 
since q # 0. Similarly, F3(3 + 4) = 49 but F3(3) + F3(4) = 25, so F3 


is not linear. (Of course, you can conclude F; is not linear since F>(0) = 


q #0.) 


Activity 7.7 T(au) = Pau, and T(u) = Pu, so we need to check that 
Pou = &Pu. Now, for all x, 


Paulx) = Pou, au,..., au, (x) 
= (æu1)x + (auz)x? + +--+ (dun)x” 
= (ux + ux? + +++ + Unx”) 
= apu(x), 


as required. 


Activity 7.12 Rotation by z radians is given by the matrix A below, 
whereas reflection in the line y = x is given by the matrix C: 


-1 0 0 1 
4=(4 a c=(\ a) 
Activity 7.13 Since Ár is m x n and As is p x m, the matrices can 
be multiplied in the order Asr and not necessarily in the other order, 
unless p =n. Therefore, the composite linear transformation ST is 


defined. But, in any case, you should still remember that this means: 
first do T and then do S. 


Activity 7.15 The result that if AB = J, then A = B7! and B = A7! 
(Theorem 3.12). 
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Activity 7.19 You can check that AAT! = J, or you can substitute 
u=x+yt+z,v=x—y,and w =x + 2y — 3z into the formula for 
T~! to see that you get back the vector (x, y, z)". 


Activity 7.22 This is very similar to the proofs in Chapter 5 that, for a 
matrix A, R(A) and N(A) are subspaces. 

First, we show R(T) is a subspace of W. Note that it is non-empty 
since 7(0) = 0 and hence it contains 0. (The fact that 7(0) = 0 can 
be seen in a number of ways. For instance, take any x € V. Then 
T(0) = T(Ox) = OT (x) = 0.) We need to show that ifu, v € R(T), then 
u+ve R(T), and, for any a € R, av € R(T). Suppose u, v € R(T). 
Then for some y1, y2 E€ V, u = T(y1), v = T(y2). Now, 


u+v=T7(y1)+ T(y2) = T(yi1 + y2), 


and so u + v € R(T). Next, 


av = a(T(y1)) = T(«yı), 


so œv € R(A). 

Now consider N(T). It is non-empty because the fact that T (0) = 
0 shows 0 € N(T). Suppose u, v € N(A) and a € R. Then to show 
u+ve MT) and au € N(T), we must show that 7(u + v) = 0 and 
T(au) = 0. We have 


T(u+v)=7(u)+7(v) =0+0=0 
and 
T(au) = a(7(u)) = a0 = 0, 


so we have shown what we needed. 


Activity 7.29 Once you have found P~', check that it is correct; check 
that PPT! = I. We have 


Te ee ee 5 
an= rsg- 4 DIGA 
7 -6 —5/ \-3 


Activity 7.30 The B coordinates are 


1 0 0 
vile = k ; [v2]; = H : [v3]ze = o 
B 0 B 1 B 


0 
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since, for example, vı = lv; + 0v2 + 0v3. This emphasises the fact 
that the order of the vectors in B = {v1, V2, v3} is important, and that it 
must be the same order for the columns of the matrix P. 


Activity 7.33 This isjust how matrix multiplication works. For example, 
let’s look closely at the first column, ¢;, of the product matrix. 


dil ttt) Ain bi > bip C11 
a23) *** A ba > bap C21 
Amı *** Amn bnp <- Dan Cm1 

bı €i 


Each entry c;; of column 1 of the product is obtained by taking the 
inner product of the ith row of A (regarded as a vector) with column 1 
of B, so c;; is the same as the ith entry of Abı. This is true for any of 
the columns of AB, so that AB = (Abı, Abz,..., Abp). 


Activity 7.35 We have, for x, 


sG) » [F], 


and for the basis B vectors, 


aE E ee): 


so these are the correct coordinates in the basis S. You could also check 
all of these by using [v]; = O7'v. 


Activity 7.39 Let J denote then x n identity matrix. Then A = J~' AJ, 
which shows that A is similar to itself. 

If C is similar to A, then there is an invertible matrix P such that 
C = P-'AP. But then A = PCP-! = (P-!)CP—, so A is similar to 
C. 

If D is similar to C, then there is an invertible matrix Q such that 
D = QỌ7!CQ.IfC is similar to A, then there is an invertible matrix P 
such that C = P~'AP. Then 


D = Q7'CQ = Q! P7'APQO = (PO) 'A(P Q), 


so D is similar to A. 
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Activity 7.44 We have 
m= (i (= (om 
ma( DE) 


7.7 Exercises 


Exercise 7.1 Find bases for the null space and range of the linear 
transformation T : R? —> R? given by 


nal Xi + x2 + 2x3 
T x2 = xı +X3 . 
X3 2x1 + x2 + 3x3 


Verify the rank—nullity theorem. Is T invertible? 


Exercise 7.2 Let S and T be the linear transformations from R? to R? 
given by the matrices 


0 1 0 1 
As= (5 ae is a 


Sketch the vectors e; and e2 in the xy-plane, and sketch the unit square. 
Describe the effect of S in words, and illustrate it using the unit square, 
by adding the images 7(e;) and 7(e2) to your sketch (and filling in the 
image of the unit square). Do the same for the linear transformation T. 

Now consider the composed linear transformations ST and TS. 
Illustrate the effect of ST and TS using the unit square (by first per- 
forming one linear transformation and then the other). Then calculate 
their matrices to check that ST 4 TS. 

You should also check that your matrix for ST matches the images 
ST(e,) and ST(e2) in your sketch, and do the same for T S. 


Exercise 7.3 Consider the vectors 


1 —1 0 1 
v= (0). w= (a) »=(1] and = (2). 
1 2 5 3 


Show that B = {v1, v2, v3} is a basis of R?. Find the B coordinates of 
u and hence express u as a linear combination of v1, V2, V3. 

A linear transformation S:IR*— R? is known to have the 
following effect 


S(vı) = è S(v2) = e2 S(v3) = 63, 
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where e;, e2, e3, are the standard basis vectors in R*. Using properties 
of linear transformations, find S(u). 

Find, if possible, the null space of S and the range of S. 

Write down the corresponding matrix As. 


Exercise 7.4 Show that the rank—nullity theorem for linear transfor- 
mations does not rule out the possibility that there exists a linear trans- 
formation T : R? — R?, whose null space, N(T), consists of vectors 
x = (x, y, z)! € R? with x = y = z and whose range, R(T), is R°. 

Suppose, further, that we require that T maps e1, e2 € R? to the 
standard basis vectors in R?. Find a matrix Ar such that the linear 
transformation T(x) = Arx is as required. Write down an expression 
for T(x) as a vector in R? in terms of x, y, z. 


Exercise 7.5 If S and T are the linear transformations given in the 
previous two exercises, decide which composed linear transformation, 
ST or TS, is defined, and find its corresponding matrix. 


Exercise 7.6 Let {e),e2,e3,e4} be the standard basis of R*, and 
let V1, V2, V3, X be the following vectors in R? (where x,y,z are 
constants): 


nO) Qo“) 


Let T be a linear transformation, T : R* —> R?, given by 


T(ei\) = vı, T(e2) = Vo, T (e3) = V3, T(e4) = x. 


(i) Suppose the vector x is such that the linear transformation T has 
dim(R(7)) = dim(N(T)). 


Write down a condition that the components of x must satisfy for 
this to happen. Find a basis of R(T) in this case. 
(ii) Suppose the vector x is such that the linear transformation has 


dim(N(T)) = 1. 


Write down a condition that the components of x must satisfy for 
this to happen. Find a basis of N(7) in this case. 


Exercise 7.7 Determine for what values of the constant A, the vectors 


(Jee) 8 


form a basis of R?. 
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Let b = (2, 0, 1)! and s = (2, 0, 3)". Deduce that each of the sets 
B = {v1, V2, b} and S = {v1, V2, S} 


is a basis of R?. Find the transition matrix P from S coordinates to B 
coordinates. 


If [wls = |2 š find [w]s. 
2 S 


Exercise 7.8 Consider the plane W in R’, 


nae 


Show that each of the sets 


AQ) GE GG 


is a basis of W. 

Show that the vector v = (5, 7, 3)' is in W and find its coordinates, 
[v]s, in the basis S. 

Find a transition matrix M from coordinates in the basis B to 
coordinate in the basis S; that is, 


[x]s = M[x]p. 


Use this to find [v] for the vector v = (5, 7, 3)' and check your answer. 


s-aytaenah 


Exercise 7.9 Suppose that T : R? — R? is the linear transformation 


given by 
x a 
a f = (-sn 1) 
X2 


—7x1 + 16x2 


Find the matrix A;g,3) of T with respect to the bases 


10.0) = GGO) 


Exercise 7.10 (For readers who have studied calculus) Consider the 
vector space F of functions f : R —> R with pointwise addition and 
scalar multiplication. The symbol C% (R) denotes the set ofall functions 
with continuous derivatives of all orders. Examples of such functions 
are polynomials, e*, cosx, sinx. Show that C%(R) is a subspace of 
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F. Show also that differentiation, D : f —> f’, is a linear operator on 
c™(R). 


7.8 Problems 


Problem 7.1 Suppose T and S are linear transformations with respec- 
tive matrices: 


a= $ F) a= (ot): 

v2 V2 

(a) Sketch the effects of T and S on the standard basis, and hence on 
the unit square with sides e,, e2. Describe T and S in words. 


(b) Illustrate ST and TS using the unit square. Then calculate their 
matrices to check that ST 4 TS. 


Problem 7.2 Consider the vectors 


ORE] 
(Jeli) 


(a) Show that each of the sets B = {v1, v2, v3} and B= {w 1, Wo, W3} 
is a basis of R°. 

(b) Write down the matrix Ar of the linear transformation T given 
by T(e1) = vı, T(e2) = V2, T(e3) = V3, where {e;, e2, e3} C R? 
is the standard basis. 

Express T(x) for x = (x, y, z)! asa vector in R? (in terms of x, y 
and z). 

(c) Write down the matrix As of the linear transformation S given 
by S(v1) = e1, S(v2) = e2, S(v3) = e3. What is the relationship 
between S and T? 

(d) Write down the matrix Apr of the linear transformation R given by 
R(e:) = Wi, R(e2) = w2, R(e3) = w3. 

(e) Is RS defined? What does this linear transformation do to v1, v2 
and v3? Find the matrix Ars and use it to check your answer. 


Problem 7.3 For each of the following linear transformations, find a 
basis for the null space of T, N(T), and a basis for the range of T, 
R(T). Verify the rank—nullity theorem in each case. If any of the linear 
transformations are invertible, find the inverse, T~!. 
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(a) T:R?— R?  givenby i 


NY 
II 
ATTN 

= 
oot 
N 

< 
NS 


(b) T:R—> R  givenby T 


SS 
II 
DETT 

= 
sE + 
Noe 
N + 
N 
xo 


(c) T:R— R? givenby T 


"a ie 


NY SNY eR SS 


~’Y_” 
II 
ia 
o_o 
m 

O = = 
= = © 
™~JV. _” 
aN 


Problem 7.4 Consider the vectors 


1 
2 
vV = 0 


NUU 


1 4 
1 5 

> H= ita bag y3 = 3l? w= 
—1 1 2 


Let {e], e2, e3} be the standard basis in R? and let T : R? —> R4 be the 
linear transformation defined by 


T(e\)=vi, T(e2)=V2, T(e3) = v3. 


Write down the matrix Ar such that T(x) = Arx. 

What is the dimension of R(T), the range of T? Is the vector w in 
R(T)? Justify your answers. 

State the rank—nullity theorem for linear transformations and use it 
to determine the dimension of the null space of T, N(T). 

Find a basis of N(T). 


Problem 7.5 If any of the linear transformations 7; given below can 
be defined, write down a matrix Ap, such that T;(x) = 47x. Otherwise, 
state why 7; is not defined. 


Tı : R? — R’, where the null space of 7; is the x axis and the range of 
Tı is the line y = x. 
T, : R? > R?, such that N(7) = {0} and 


e(O) 


T; : R? > R?, where the null space of 73 is the line y = 2x and the 
range of T; is the line 


X 
4 
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Problem 7.6 Let V and W be the subspaces 


l(i 


—2 


3 


A 


= Wwe j 


W = Lin a 


N 
J 


5 
i has 
0 1 


Consider the possibility of a linear transformation T and a linear trans- 
formation S such that 


T:R° SR? with N(T)=V and R(T)=W; 


S:R° >R’ with N(S)=W and RS) =F. 


Show that one of these, S or T, cannot exist. Then find a matrix As 
or Ár representing the other linear transformation (with respect to the 
standard basis in each Euclidean space). 

Check your answer by row reducing the matrix and finding the null 
space and range of the linear transformation. 


Problem 7.7 Consider the linear transformations T : R? —> R? and 
S: R? > R? with matrices 


1 -3 1 0 1 
ar=(-2 s). a= (= 0) 
-1 3 0 2 1 


Find the null space of T, N(T), and the range of T, R(T). Describe 
each subspace or write down a basis. Do the same for the nullspace of 
S, N(S), and the range of S, R(S). 

Which linear transformation is defined: ST or T S? 

Deduce the null space of the composed linear transformation. 

Use the rank-nullity theorem to find the dimension of the range of 
the composed linear transformation. 


Problem 7.8 Let 


5 9 —1 15 4 
a=- 1 7 a), a= (4), 
1 2 0 3 1 


(a) Let T denote the linear transformation T(x) = Ax. Find a basis of 
the null space of T, N(7). Find the range of T, R(T). Show that 
d € R(T). Find all vectors x such that T(x) = d. 
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(b) Lets bea linear transformation, S : R? —> R?, such that the range 
of S is R? and the null space of S is the subspace: 


N(S) = {x |x = td, t€ R}, 


where d is the vector given above. 

Consider the composition ST of linear transformations S and 
T. Deduce the range of ST from the ranges of T and S. Then use 
the rank—nullity theorem to determine the dimension of N(ST), 
the null space of ST. 

Find a basis of N(ST). 


Problem 7.9 Letci, c2, c3 denote the columns of the matrix P, where 


1 0 1 
p=(1 1 2). 
3-2 4 


Find the determinant of P. Why can you deduce that the set B = 
{c1, C2, ¢3} is a basis of R?? 

If w = (1, 1, 0)' in standard coordinates, find the coordinates of w 
in the basis B, [w]z. 


6 
Find the vector v in standard coordinates if [v] = | — | ; 
—2 B 


Problem 7.10 Show that each of the following sets B and B isa basis 
of R?: 


a la LON), 


Write down the transition matrix P from B coordinates to standard 
coordinates. Write down the transition matrix Q from B coordinates to 
standard coordinates. 

Find the transition matrix from B coordinates to B coordinates. 


2 
If [x]; = i , find [x], . 
3 


Aa 


B 
Problem 7.11 


(a) Change the basis in R? by a rotation of the axes through an angle 
of 2/6 clockwise: First write down the matrix of the linear trans- 
formation which accomplishes this rotation and then write down 
the new basis vectors, vı and v2 (which are the images of e; and 
e2). 
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Let B = {v1, v2} be the new basis. Write down the transition 
matrix P from B coordinates to standard coordinates. 

(b) The curve C is given in standard coordinates, x, y, by the equation 
3x? +2/3xy + 5y? = 6. Find the equation of the curve in the 
new B coordinates, (X, Y). 
Use this information to sketch the curve C in the xy-plane. 


Problem 7.12 Suppose 


wf). GD} = G). 16)=(258) 


(a) Show that M is a basis of R*. Write down the transition matrix 
from M coordinates to standard coordinates. Find [v]y, the M 
coordinates of the vector v. 

(b) Write down the matrix 4 of the linear transformation 


T : R? > R? 


with respect to the standard basis. 
Find the matrix of T in M coordinates. Call it D. 
Describe geometrically the effect of the transformation T as a map 
from R? > R?. 
(c) Find the image of [v] using the matrix D. 
Check your answer using standard coordinates. 


8 


Diagonalisation 


One of the most useful techniques in applications of matrices and linear 
algebra is diagonalisation. This relies on the topic of eigenvalues and 
eigenvectors, and is related to change of basis. We will learn how to find 
eigenvalues and eigenvectors ofan n x n matrix, how to diagonalise a 
matrix when it is possible to do so and also how to recognise when it 
is not possible. We shall see in the next chapter how useful a technique 
diagonalisation is. 

All matrices in this chapter are square n x n matrices with real 
entries, so all vectors will be in R” for some n. 


8.1 Eigenvalues and eigenvectors 
8.1.1 Definition of eigenvalues and eigenvectors 
The first important ideas we need are those of eigenvalues and their 


corresponding eigenvectors. 


Definition 8.1 Suppose that A is a square matrix. The number A is said 
to be an eigenvalue of A if for some non-zero vector x, 


AX = ÀX. 


Any non-zero vector x for which this equation holds is called an eigen- 
vector for eigenvalue i or an eigenvector of A corresponding to eigen- 
value i. 


8.1.2 Finding eigenvalues and eigenvectors 


To determine whether à is an eigenvalue of A, we need to determine 
whether there are any non-zero solutions x to the matrix equation 
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Ax = Xx. Note that the matrix equation Ax = Xx is not of the stan- 
dard form, since the right-hand side is not a fixed vector b, but depends 
explicitly on x. However, we can rewrite it in standard form. Note that 
Ax = XI x, where J is, as usual, the identity matrix. So, the equation 
is equivalent to Ax = Ax, or Ax — AJx = 0, which is equivalent to 
(A —AI)x = 0. 

Now, a square linear system Bx = 0 has solutions other than x = 0 
precisely when |B| = 0. Therefore, taking B = A — AJ, A is an eigen- 
value if and only if the determinant of the matrix A — AJ is zero. This 
determinant, p(A) = |A — ÀI | is a polynomial of degree n in the vari- 
able i. 


Definition 8.2 (Characteristic polynomial and equation) The poly- 
nomial |4 — A/| is known as the characteristic polynomial of A, and 
the equation |A — AJ| = 0 is called the characteristic equation of A. 


To find the eigenvalues, we solve the characteristic equation |4 — AJ | = 
0. Let us illustrate with a 2 x 2 example. 


7 215 
pa), 
Then 
7 —15 1 0 Tar =15 
a-u =(; “4 )-A(G ea 2 Ee 


and the characteristic polynomial is 
il ee -15 
ie 


Example 8.3 Let 


|A —al| oad 


= (7 — àA)\(—4 — à) + 30 
=)? — 3A — 28 +30 
=) 23) +2. 


So the eigenvalues are the solutions of à? — 3A + 2 = 0. To solve this 
for A, we could use either the formula for the solutions to a quadratic 
equation, or simply observe that the characteristic polynomial fac- 
torises. We have (A — 1)(A — 2) = 0 with solutions à = 1 and A = 2. 
Hence the eigenvalues of A are 1 and 2, and these are the only eigen- 
values of A. 


To find an eigenvector for each eigenvalue A, we have to find a non- 
trivial solution to (A — A/)x = 0, meaning a solution other than the zero 
vector. (We stress the fact that eigenvectors cannot be the zero vector 
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because this is a mistake many students make.) This is easy, since for a 
particular value of à, all we need to do is solve a simple linear system. 
We illustrate by finding the eigenvectors for the matrix of Example 8.3. 


Example 8.4 We find the eigenvectors of 
7 —15 
Fem times 
We have seen that the eigenvalues are | and 2. To find the eigenvectors 


for eigenvalue 1, we solve the system (A — /)x = 0. We do this by 
putting the coefficient matrix A — J into reduced echelon form. 


aan=($ Boa E) 


This system has solutions 


a 


5 
i forany t€ 


There are infinitely many eigenvectors for 1: for each t Æ 0, v is an 
eigenvector of A corresponding to à = 1. But be careful not to think 
that you can choose t = 0; for then v becomes the zero vector, and this 
is never an eigenvector, simply by definition. To find the eigenvectors 
for 2, we solve (A — 2/)x = 0 by reducing the coefficient matrix, 


(4-2 =(5 ce) Sa e a 


Setting the non-leading variable equal to ¢, we obtain the solutions 


rar(2). rer 


Any non-zero scalar multiple of the vector (3, 1)' is an eigenvector of 
A for eigenvalue 2. 


Note that, in this example, each system of equations is simple 
enough to be solved directly. For example, if x = (x1, x2)', the system 
(A — 27)x = 0 consists of the equations 


5x; — 15x. = 0, 2x; — 6x2 = 0. 


Clearly, both equations are equivalent to x; = 3x2. If we set x2 = t for 
any real number ¢, then we obtain the eigenvectors for A = 2 as before. 
However, we prefer to use row operations. There are two reasons for this. 
The first reason is that the system of equations may not be as simple 
as the one just given, particularly for an n x n matrix where n > 2. 
The second reason is that putting the matrix A — AJ into echelon form 
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provides a useful check on the eigenvalue. If |4 — AJ| = 0, the echelon 
form of A — AJ must have a row of zeros, so the system (A — A/)x = 0 
has a non-trivial solution. If we have reduced the matrix (A — Ao/) for 
some supposed eigenvalue Ao and do not obtain a zero row, we know 
immediately that there is an error, either in the row reduction or in the 
choice of Ag, and we can go back and correct it. 

We now give two examples with 3 x 3 matrices. 


Example 8.5 Suppose that 


404 
a=(0 44), 
4 4 8 


Let’s find the eigenvalues of A and corresponding eigenvectors for each 
eigenvalue. 
To find the eigenvalues, we solve |4 — AJ| = 0. Now, 


4-2 0 4 
ASS a. a 
4 4 8-A 

4-27 4 eee 

=(4-2)| 4 ea A 


= (4 — X) ((4 — AX(8 — à) — 16) + 4(—4(4 —a)) 
= (4— X) ((4 — àAX(8 — à) — 16) — 16(4 — à). 


We notice that each of the two terms in this expression has 4 — À as 
a factor, so instead of expanding everything, we take 4 — à out as a 
common factor, obtaining 


|A —AI| =(4—A)((4 —A)(8 — A) — 16 — 16) 
= (4 — à)(32 — 12A + à? — 32) 
= (4 — A)(à? — 121) 
= (4—A)a(a — 12). 


It follows that the eigenvalues are 4, 0, 12. (The characteristic polyno- 
mial will not always factorise so easily. Here it was simple because of 
the common factor 4 — à. The next example is more difficult.) 

To find an eigenvector for 4, we have to solve the equation 
(A — 4])x = 0 for x = (x1, x2, x3)'. Using row operations, we have 


0 0 4 1 1 0 
(o 0 4) on (0 0 1). 
4 4 4 0 0 0 
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Thus, x3 = 0 and setting the free variable x2 = t, the solutions are 


—1 
(a) te 
0 


So the eigenvectors for A = 4 are the non-zero multiples of 


(2) 


Activity 8.6 Determine the eigenvectors for 0 and 12. Check your 
answers: verify that Av = Av for each eigenvalue and one correspond- 


ing eigenvector. 
—3 -l -2 
A= 1 -1 1 ; 
1 1 0 


Given that —1 is an eigenvalue of A, find all the eigenvalues of A. 
We calculate the characteristic polynomial of A: 


Jai 1 2 
a-a =] fe psi “4 


ee 


Example 8.7 Let 


1 1 —À 


abere d 11 TEE 
= E r 


=(—3 = A)? +å — 1)+ (~à — 1) — 22 +2) 
= —)? — 4)? — 5A — 2 = (A? +407 + 5A + 2). 


Now, the fact that — 1 is an eigenvalue means that —1 is a solution of the 
equation |A — AJ| = 0, which means that A — (—1) (that is, A + 1) isa 
factor of the characteristic polynomial |4 — AZ|. So this characteristic 
polynomial can be written in the form 


=(A.+ I(r? + bà + ©). 


Clearly, we must have a = 1 and c = 2 to obtain the correct 4° 
term and the correct constant. So the polynomial can be written as 
—(A + 1)(A? + bà + 2). Using this, and comparing the coefficients of 
either A? or A with the cubic polynomial, we find b = 3. For instance, 
think about the term involving 47. We know that the characteristic poly- 
nomial has the following term: —4A*. On the other hand, if we look 
at how the expression —(A + 1)(A* + bà + 2) would be expanded, it 
would generate the term —A? — bd”. So we must have —1 — b = —4 
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and hence b = 3. In other words, the characteristic polynomial is 


=O? $47 + 5A +2) = —(à + 107 4 304-2) 
=—(A+ DA +A +1). 


Activity 8.8 Perform the calculations to check that b = 3 and that the 
characteristic polynomial factorises as stated. 


We have, |A — AI| = —(A + 1)?(A + 2). The eigenvalues are the solu- 
tions to |A — AJ| = 0, so they are A = —1 anda = —2. 

Note that in this case there are only two distinct eigenvalues. We 
say that the eigenvalue —1 has occurred twice, or that A = —1 is an 
eigenvalue of multiplicity 2. We will find the eigenvectors when we 
look at this example again in Section 8.3. 


8.1.3 Eigenspaces 


If A is an n x n matrix and å is an eigenvalue of A, then the set of 
eigenvectors corresponding to the eigenvalue à together with the zero 
vector, 0, is a subspace of R”. Why? 

We have already seen that the null space of any m x n matrix is a 
subspace of R”. The null space of the n x n matrix A — iJ, consists of 
all solutions to the matrix equation (A — AJ)x = 0, which is precisely 
the set of all eigenvectors corresponding to A, together with the vector 
0. We give this a special name. 


Definition 8.9 (Eigenspace) If A is ann x n matrix and À is an eigen- 
value of A, then the eigenspace of the eigenvalue à is the subspace 
N(A — AI) of R”. 


The eigenspace of an eigenvalue à can also be described as the set S, 
where 


S = {x | Ax = ax}. 
Activity 8.10 Show this. 


In Exercise 5.2, you showed that the set S = {x | Ax = Ax}isasubspace 
of R” for any à € R. If d is not an eigenvalue of A, then S contains only 
the zero vector, S = {0}, and dim(S) = 0. When, and only when, A is 
an eigenvalue of A do we know that there is a non-zero vector in S, and 
hence dim(S) > 1. In this case, S is the eigenspace of the eigenvalue 2. 


8.1 Eigenvalues and eigenvectors 253 


8.1.4 Eigenvalues and the matrix 


We now explore how the eigenvalues of a matrix are related to other 
quantities associated with it, specifically the determinant (with which 
we are already familiar) and the trace. 

There is a straightforward relationship between the eigenvalues of 
a matrix A and its determinant. Suppose A is ann x n matrix. Then the 
characteristic polynomial of A is a polynomial of degree n in å: 


pQ) =|A — All = (D A" + aya"! +--+ +9). 


Let 21, A2,..., An be the eigenvalues of A, with multiple roots listed 
each time they occur. In terms of the eigenvalues, the characteristic 
polynomial factors as 


P(A) = |A — AT] = (IA — AA = A2) ++ (A Àn). 


For instance, let’s look at the matrix in Example 8.7, 


-3 -1 <2 
a= =I i). 
1 1 0 


The eigenvalues of A are à = —1, of multiplicity 2, and A = —2. So 
we may list the eigenvalues as A; = 42 = —1 and A3 = —2. Then, 


P(A) = (IPA = ADA = ANA — A3) = —A+ DAFt 1A + 2), 


as we saw earlier. 
If we let A = 0 in the equation 


D(A) = |A — AL] = (SD'A — AA — A2)- + (A An), 


then we obtain the constant term of the polynomial, 
p(0) = JA] = (—1)"ag = (-—1)"(-1)" (yA see An) = AjA2 see Ans 
Therefore, we have proved the following. 


Theorem 8.11 The determinant of an n x n matrix A is equal to the 
product of its eigenvalues. 


Example 8.12 Let’s look again at the matrix in Example 8.7, 


a3 Sl 2 
a=(1 i i). 
1 1 0 


The eigenvalues of A are A; = à2 = —1 and à; = —2. Calculating 
the determinant of the matrix, we find |A| = —2, which is indeed the 
product of the three eigenvalues. 
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Activity 8.13 Check this. Calculate | A]. 


Now look at the sum of the diagonal entries of the matrix in the example 
just given, and notice that it is equal to the sum of the three eigenvalues. 
This is true in general. The sum of the diagonal entries of a matrix is 
known as the trace of the matrix. 


Definition 8.14 (Trace) The trace of a square matrix A is the sum of 
the entries on its main diagonal. 


Theorem 8.15 The trace of ann x n matrix A is equal to the sum of 
its eigenvalues. 


Proof: We can obtain this result by examining the equations 


|A — AZ| = (-1)"(QA" + aya"! +++» +.) 
= (-1)"(A = AL) = Aa) ++ (A= An) 


again, this time looking at the coefficient of A”~!. You can consider the 
proof optional and safely omit it, but if you wish to see how it works, 
read on. 

The coefficient of A”! is (—1)”a,_1 in the middle expression, but 
what we are actually interested in is how the coefficient of 4”~! is 
obtained from the other two expressions. First, think about how it is 
obtained from the factorised polynomial 


(SD'A = AA — Ag)+ ++ (A = An) 


when the factors are multiplied together. Ignoring the (—1)” for the 
moment, if we multiply all the As together, one from each factor, we 
obtain the term å”. So to obtain the terms with 4” ', we need to multiply 
first —A, times the As in all the remaining factors, then —à2 times the 
As in all the other factors and so on. Putting back the factor (—1)”, the 
term involving à”7! is 


(=1)"(—Ay = Ag AAT ES (HDA H A2 be AAT 

(1) 
Now let’s look at the coefficient of A”~! in the expansion of the deter- 
minant, |4 — A/|. This is far more complicated, and we will need an 


inductive argument. 
If Aisa2 x 2 matrix, 


ai, a2 
a = ( ) i 
a21) an2 
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then 


ayy—A a12 


A-Al|= 
| | a21 an — À 


[= 22 = (an + an)à + lAl. 


We see that the coefficient of À is (—1) times the trace of A. 
Now consider a 3 x 3 matrix A. We have 


ay — À a12 a13 
|A—AI| =] aa an—À az 
a31 a32 a33 — À 


Expanding by the first row, we see that the only term which contains 
powers of à higher than 1 comes from the (1, 1) entry times the (1, 1) 
cofactor; that is, (aj; — à)C11. But C4; is the determinant of a 2 x 2 
matrix, so we are looking at the A? terms of 


(ay, — AMA? — (az + a33)À + (a22a33 — a23432)). 
The à? terms are 
a11)? + (an + a33)\? = (a11 + an + a33))?, 


so the coefficient of å? is (—1)? times the trace of A. 

What we have seen so far makes us fairly certain that the coefficient 
of the term ”~! in the expansion of the determinant |A — AT | for an 
n x n matrix A is equal to (—1)"~! times the trace of A. We have shown 
that this is true for any 2 x 2 and any 3 x 3 matrix. We now assume it 
is true for any (n — 1) x (n — 1) matrix, and then show that this implies 
it is also true for any n x n matrix. In this way, starting with n = 2, we 
will know it is true for all n x n matrices. 

So suppose A = (a;;) 1s ann x n matrix and look at the coefficient 
of à”! in the cofactor expansion of |A — AJ| by row 1: 


ayı — À a12 e din 

ar) Bah” oss dAn 

E DE dea ee , j 
an) an2 Ann —A 


= (a1) —A)Cy) tanCy2+---. 


Only the first term of the cofactor expansion, (a,; — à)C11, contains 
higher powers of A than A”~?. 


Activity 8.16 Look at the other terms to see why this is true. 


Now Cj, is the determinant of the (n — 1) x (n — 1) matrix obtained 
from the matrix (A — àI) by crossing out the first row and first col- 
umn, so it is of the form |C — A/|, where C is the (n — 1) x (n — 1) 


256 Diagonalisation 


matrix obtained from A by crossing out the first row and first column. 
Therefore, by our assumption, 


|A—AT| 
= (a — À)C11 

= (an — A(S I) + (1) (ann + -e Hanna"? +) 
= (—1'A" + (HD) (an+ an2 +++ + aan) A +. 


We can now conclude that the term involving A”7! in the expansion of 
|A — àI | for any n x n matrix A is equal to 


(-1)""! (a11 + az + a33 + +-+ + ann )A"T!. (2) 


Comparing the coefficients of à”7! in the two expressions (1) and (2), 
we see that 


a11 + 99 +433 += * + ann = Ay HA2 +: t Àn; 


that is, the trace of A is equal to the sum of the eigenvalues, 


8.2 Diagonalisation of a square matrix 
8.2.1 Diagonalisation 


Recall that square matrices A and M are similar if there is an invertible 
matrix P such that P~'AP = M. We met this idea earlier when we 
looked at how a matrix representing a linear transformation changes 
when the basis is changed. We now begin to explore why this is such 
an important and useful concept. 


Definition 8.17 (Diagonalisable matrix) The matrix A is diagonalis- 
able if it is similar to a diagonal matrix; in other words, if there is a 
diagonal matrix D and an invertible matrix P such that PT!AP = D. 


When we find suitable P and D such that P~'AP = D, we say that we 
are diagonalising A. 


Example 8.18 The matrix 
7 —15 
a=(3 4) 


from Example 8.3 is diagonalisable, because if we take P to be 


5:3 
Ba ‘ae 
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then P is invertible, with 


and, as you can check, 


-1 eee a 
P AP=D=(, 2)” 


which is a diagonal matrix. 


Activity 8.19 Check this! Obtain the product P~!AP by first multi- 
plying AP and then multiplying on the left by P7!. What do you notice 
about AP? 


The example just given probably raises a number of questions in your 
mind. Prominent among those will be: ‘How was such a matrix P 
found?’ (Have a look back at Example 8.4. What do you notice?) A 
more general question is: “When will a matrix be diagonalisable?’ To 
answer both of these questions, we start by outlining a general method 
for diagonalising a matrix (when it is possible). 


8.2.2 General method 


Let’s first suppose that the n x n matrix A is diagonalisable. So, assume 
that P~'AP = D, where D is a diagonal matrix 


a 0 0 

D = diag(A1, À i) a : 
= dia Àa, aon, An) = 

SAL, A2 0 0 0 

O O0 > Àn 


(Note the useful notation for describing the diagonal matrix D.) Then, 
since P-'AP = D, we have AP = PD. Suppose the columns of P 
are the vectors Vv}, V2,..., Vn. Then, thinking about how matrix multi- 
plication works (see Activity 7.33), we can see that 


AP=A(y ... Vn) = (AVı ... ÁVn). 


Furthermore, 
A, 0 0 
PD=( ) n Top AnVn) 
= (Vi... Vn = (À1Vi ... ÀnVn). 
1 0 0 1Y1 
0 0 Àn 
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So this means that 
AV; = àV, ÁV =AoV2, ..-, AVy = AnVn.- 


The fact that P~! exists means that none of the vectors v; is the zero 
vector (because any matrix with a column of zeros would not be invert- 
ible). So this means that (for i = 1,2,...,7), V; is a non-zero vector 
with the property that Av; = A;v;. But this means precisely that A; is 
an eigenvalue of A and that v; is a corresponding eigenvector. Since P 
has an inverse, these eigenvectors are linearly independent. Therefore, 
A has n linearly independent eigenvectors. 

Conversely, suppose A has n linearly independent eigenvectors, 
V1, V2,---, Vn, which correspond to eigenvalues 41, A2,...,A,. Let P 
be the matrix whose columns are these eigenvectors: P = (v1 ... Vn). 
Because the columns are linearly independent, P will be invertible. 
Furthermore, since Av; = i;V;, it follows that 


AP = A(v ... Vn) 


= (Avı TE AV) 
= (àivı ret AnVn) 
A; 0 0 
( ) 0 dA» 0 
= (Vi ... Vn : 
! 0 0%. 0 
0 0 Àn 
= PD, 
where D = diag(à1, A2,..., An) is the diagonal matrix whose entries 


are the eigenvalues. The fact that P is invertible then implies that 
P-!AP = P!P D = D. So it follows that A is diagonalisable and the 
matrix P is such that PT! AP is a diagonal matrix. 


Example 8.20 Now it should be clear where P in Example 8.18 came 
from. In Examples 8.3 and 8.4, we discovered that the eigenvalues of 


7 —=15 
a=(; a] 


are l and 2 and that corresponding eigenvectors are 


n=) uC) 


This is why, if we take 
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then P is invertible and P~' AP is the diagonal matrix 


a-h 9) 


Moreover, the general discussion we have given establishes the follow- 
ing important result: 


Theorem 8.21 Ann x n matrix A is diagonalisable if and only if it has 
n linearly independent eigenvectors. 


Since 7 linearly independent vectors in IR” form a basis of R”, another 
way to state this theorem is: 


Theorem 8.22 Ann x n matrix A is diagonalisable if and only if there 
is a basis of R” consisting of eigenvectors of A. 


Example 8.23 In Example 8.5 (and Activity 8.6), we found the eigen- 
values and eigenvectors of the matrix 


4 0 4 
a=(0 44). 
4 4 8 


We will now diagonalise A. We have seen that it has three dis- 
tinct eigenvalues 0, 4, 12. From the eigenvectors we found, we 
take one eigenvector corresponding to each of the eigenvalues 
Ay = 4, à2 = 0, A3 = 12, in that order, 


(i) GC} -0 


We now form the matrix P whose columns are these eigenvectors: 


-1 -1 1 
p= (i —1 ') 
0 1 2 


Then we know that D will be the matrix 


4 0 0 
p= (0 0 0). 
0 0 12 


You can choose any order for listing the eigenvectors as the columns of 
the matrix P, as long as you write the corresponding eigenvalues in the 
corresponding columns of D; that is, as long as the column orders in P 
and D match. (If, for example, we had instead chosen P = (v2 vı V3), 
then D would instead be diag(0, 4, 12).) 
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As soon as you have written down the matrices P and D, you should 
check that your eigenvectors are correct. That is, check that 


AP = (Avı Av? Av3) = (Aiv1 À2V2 A3V3) = PD. 


Activity 8.24 Carry out this calculation to check that the eigenvectors 
are correct; that is, check that the columns of P are eigenvectors of A 
corresponding to the eigenvalues 4, 0, 12. 


Then, according to the theory, if P has an inverse — that is, if the eigen- 
vectors are linearly independent — then P~!4 P = D = diag(4, 0, 12). 


Activity 8.25 Check that P is invertible. Then find P7! (which may 
be calculated using either elementary row operations or the cofactor 
method) and verify that P~! AP = D. 


Note how important it is to have checked P first. Calculating the inverse 
of an incorrect matrix P would have been a huge wasted effort. 


8.2.3 Geometrical interpretation 


There is amore sophisticated way to think about diagonalisation in terms 
of change of basis and matrix representations of linear transformations. 
Suppose that T = T4 is the linear transformation corresponding to A, 
so that T(x) = Ax for all x. Then A is the matrix representing the linear 
transformation T in standard coordinates. 

Suppose that A has a set of n linearly independent eigenvectors 
B = {vj, V2,..., Vn}, corresponding (respectively) to the eigenvalues 
A1,.++,An. Then B is a basis of R”. What is the matrix representing T 
with respect to this basis? 

By Theorem 7.37, the matrix representing T in the basis B is 


Arp.p) = P'AP, 
where the columns of P are the basis vectors of B, so that 
P = (vj sea Vie 


In other words, the matrices A and Ajz,,) are similar. They repre- 
sent the same linear transformation, but A does so with respect to the 
standard basis and A{g,g] represents T in the basis B of eigenvectors 
of A. 

But what is A;z, 3)? According to Theorem 7.36, the ith column of 
M should be the coordinate vector of T(v;) with respect to the basis B. 
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Now, T(v;) = Av; = 4;V;, so the coordinate vector [7'(v;)]z is just the 
vector with à; in position 7 and all other entries zero. 


Activity 8.26 Why is this true? 


It follows that A;z,3) must be the diagonal matrix 


a 0o 0 
p |0% 5 0 
0 0 -. 0 
E 258, oi 


n 


We see, therefore, that 
P'AP = Aig, 5 = D. 


Let’s explore this a little further to see what it reveals, geometrically, 
about the linear transformation T = T4. Ifx € R” is any vector, then its 
image under the linear transformation T is particularly easy to calculate 
in B coordinates. For example, suppose the B coordinates of x are given 


by 
b 
bs 


[x] _| 
oe 


Then, since [7(x)]3 = A;z,ai[xlza = D[x]z, we have 


dM 0 Peres 0 bı àbi 

0O à... 0 by à2b2 
T = = 7 
[T)]2 a S 

0 O ... Ay balg Anbn lpg 


So the effect is simply to multiply each coordinate by the corresponding 
eigenvalue. 

This gives an interesting geometrical interpretation. We can 
describe the linear transformation T as a stretch in the direction of 
the eigenvector v; by a factor à; (in the same direction if A > 0 and in 
the opposite direction if A < 0). We say that the line x = tv;, t € R, 
is fixed by the linear transformation T in the sense that every point on 
the line is mapped to a point on the same line. Indeed, this can be seen 
directly. Since Av; = à;v;, each vector on the line tv;, is mapped into 
the scalar multiple 4;tv; by the linear transformation A. If à; = 0, the 
line tv; is mapped to 0. 
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Activity 8.27 Geometrically, how would you describe the linear trans- 
formation T4(x) = Ax for Example 8.23? 


Activity 8.28 Have another look at Example 7.42. 


8.2.4 Similar matrices 


Now let’s consider any two similar matrices A and B with B = P~'AP. 
We will show that A and B have the same eigenvalues, and that they have 
the same corresponding eigenvectors expressed in different coordinate 
systems. 

First, let’s look at this geometrically. 

If T = Ty, then A is the matrix of the linear transformation T in 
standard coordinates, and B = P~!AP is the matrix of the same linear 
transformation 7 in coordinates with respect to the basis given by the 
columns of the matrix P (see Section 7.4.2). As we have just seen, 
the effect of T as a mapping T : R” — R” can be described in terms of 
the eigenvalues and eigenvectors of A. But this description (involving 
fixed lines and stretches) is intrinsic to the linear transformation, and 
does not depend on the coordinate system being used to express the 
vectors. Therefore, the eigenvalues of B must be the same as those of 
A, and the corresponding eigenvectors must be the same vectors, only 
given in a different basis. 

To establish these facts algebraically, we begin with the following 
result: 


Theorem 8.29 Similar matrices have the same characteristic polyno- 
mial. 


Proof. Let A and B be similar matrices with B = P~'AP. The char- 
acteristic polynomial of A is given by the determinant |4 — AZ|. The 
characteristic polynomial of B is 
|B —àI| =|P (AP — I| =|P AP HAP P| 
= |PT!AP — PAIP], 
since PT! P = I. We now factor out P~! on the left and P on the right 
to obtain 
IB —AI| = |P7!(4 —AD)P]| = |P N\A — AII |P] = |4 — Al], 


since the determinant of a product is the product of the determinants, 
and since |P~'| = 1/|P]. 
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We can now prove the following theorem: 


Theorem 8.30 Similar matrices have the same eigenvalues, and the 
same corresponding eigenvectors expressed in coordinates with respect 
to different bases. 


Proof: That similar matrices have the same eigenvalues is a direct 
consequence of the previous theorem, since the matrices have the same 
characteristic polynomials and the eigenvalues are the solutions of the 
characteristic equations, |A — AJ| = |B —AJ| =0. 

Now for the eigenvectors. Let A and B be similar matrices, with B = 
P~'AP. We consider the invertible matrix P as the transition matrix 
from standard coordinates to coordinates in the basis, S, consisting of 
the column vectors of P, so that 


v= P[v]s and [v]s = P lv. 
If à is any eigenvalue of A and v is a corresponding eigenvector, then 
AV = dv. 


Using these facts, let’s see what happens if we multiply the matrix B 
with the same eigenvector given in the S coordinates: 
B[v]s = P~!APIv]s 
= P“'Av 
= Phy 
=)\P ly 
= i[V]s. 


Therefore, [v]s is an eigenvector of B corresponding to eigenvalue 
À. 


8.3 When is diagonalisation possible? 


By Theorem 8.21, an n x n matrix is diagonalisable if and only if it has 
n linearly independent eigenvectors. However, not all n x n matrices 
have this property, and we now explore further the conditions under 
which a matrix can be diagonalised. 

First, we give two examples to show that not all matrices can be 
diagonalised. 
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8.3.1 Examples of non-diagonalisable matrices 


Example 8.31 The 2 x 2 matrix 


4 1 
4= (43) 
has characteristic polynomial A* — 6 + 9 = (A — 3)’, so there is only 


one eigenvalue, à = 3. The eigenvectors are the non-zero solutions to 
(A — 37)x = 0: that is, 


This is equivalent to the single equation x; + x2 = 0, with general 
solution x; = —x2. Setting x. = t, we see that the solution set of the 
system consists of all vectors of the form v = (—f, t)" as t runs through 
all real numbers. So the eigenvectors are precisely the non-zero scalar 
multiples of the vector v = (—1, 1)". Any two eigenvectors are therefore 
scalar multiples of each other and hence form a linearly dependent set. 
In other words, there are not two linearly independent eigenvectors, and 
the matrix A is not diagonalisable. 


There is another reason why a matrix A may not be diagonalisable over 
the real numbers. Consider the following example: 


Example 8.32 If A is the matrix 


0 -1l 
4=() 9): 
then the characteristic equation 


—À a 


— 32 = 
> |= +1=0 


|A-—Al| = | 
has no real solutions. 
This matrix A can be diagonalised over the complex numbers, but 
not over the real numbers. (We will look at complex numbers and 
matrices in Chapter 13.) 


So far, and until Chapter 13, we are dealing with matrices A with real 
number entries. If A is diagonalisable, so that there is an invertible P 
(with real number entries) with P~'AP = diag(A,,...,A,), then, as 
we have seen, the A; are the eigenvalues of A. So, it follows that all 
the eigenvalues must be real numbers. Example 8.32 is an example of 
a matrix that fails to be diagonalisable because it does not have this 
property. On the other hand, the matrix in Example 8.31 does have only 
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real eigenvalues, yet fails to be diagonalisable. We will return shortly to 
the general question of when a matrix can be diagonalised. But for now 
we consider the special case in which an n x n matrix has n different 
(real) eigenvalues. 


8.3.2 Matrices with distinct eigenvalues 


We now show that if a matrix has n different eigenvalues (that is, if 
it has distinct eigenvalues), then it will be diagonalisable. This is a 
consequence of the following useful result. The proof we give here is a 
proof by contradiction. 


Theorem 8.33 Ligenvectors corresponding to different eigenvalues are 
linearly independent. 


Proof. Suppose the result is false for the n x n matrix A. Let’s take 
any smallest possible set S of eigenvectors corresponding to distinct 
eigenvalues of A with the property that the set is linearly dependent. 
(This set S will have at least 2 and at most n members.) So, S consists 
of eigenvectors of A, each corresponding to different eigenvalues, and 
it is a linearly dependent set; and, furthermore, any proper subset of S is 
not a linearly dependent set. Call the vectors in this set v1, V2, ..-, Vk. 
Then, because S is linearly dependent, there are non-zero numbers 
C1, C2, ..., Cx Such that 


C1V1 + C2V2 +... + CV = Q. 


(You might wonder why we assert that all the c; are non-zero, rather 
than just that not all of them are zero. But remember that no proper 
subset of S is linearly dependent. If c; was 0, we could delete v; from S 
and have a proper subset of S that is linearly dependent, which can’t be 
the case.) 
Multiplying this equation by A, we have: 
A(c1v, + CoV2 +... + eV) = C1 AV] + C2AV2 +... + CK AVE 
= 11 C1 Vy + À2C2V2 +... + ALCEVE. 

But this must be equal to 40 = 0, since cv} + CovV2 +... + CkVk = 0. 
Hence we have 


Ly = à1C1V1 + À2C2V2 +... + AperV, = Q. 
Furthermore, if we simply multiply both sides of the equation 


C1V1 + C2V2 +... + ¢4V, = 0 
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by 41, we obtain 
La = hy C1V, + À1C2V2 +--+ + À1CkVk = Q. 

It follows that 

Li — Lg = (Ayeyvy F +++ + Agegve) — (À1c1V1 + +++ + Arceve) 

=0-0 
= 0, 

which means 

(Az — à1)c2V2 + +++ + (Àx — à1)CkYk = 0. 


Since the A; are distinct and the c; are non-zero, this says that the 
vectors V2, ..., Vg are linearly dependent, which contradicts the original 
assumption that no proper subset of S is linearly dependent. So we must 
conclude (for otherwise, there is a contradiction) that there is no such 
set S. That means that any set of eigenvectors corresponding to distinct 
eigenvalues is linearly independent. 


It follows that if an n x n matrix has n different eigenvalues, then a 
set consisting of one eigenvector for each eigenvalue will be a linearly 
independent set of size n and hence, by Theorem 8.21, the matrix will 
be diagonalisable. That is, we have the following theorem. 


Theorem 8.34 Jf an n x n matrix has n different eigenvalues, then 
it has a set of n linearly independent eigenvectors and is therefore 
diagonalisable. 


8.3.3 The general case 


Theorem 8.34 provides a sufficient condition for an n x n matrix to 
be diagonalisable: if it has n different (real) eigenvalues, then it is 
diagonalisable. It is not, however, necessary for the eigenvalues to be 
distinct in order for the matrix to be diagonalisable. What is needed for 
diagonalisation is a set ofn linearly independent eigenvectors, and this 
can happen even when there is a ‘repeated’ eigenvalue (that is, when 
there are fewer than n different eigenvalues). The following example 
illustrates this. 


Example 8.35 Consider the matrix 


3 —1 1 
a= f z o). 
1 —1 3 
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The eigenvalues are given by the solutions of the characteristic equation 
|A — àI | = 0. Expanding the determinant by the second row, 
3-A -l 1 

0 2—ì 0 

1 -1 3—ì 


3—À | 
=@-9/°5 "e 


=(2— AXA? — 64 +9- 1) 
= (2 — A)(à? — 6A + 8) 
= (2—A)(A —4)(a — 2) = —(à — 2} (à — 4). 


|JA—Al| = 


The matrix A has only two eigenvalues: à = 4 and à = 2. Because 
(A — 2} is a factor of the characteristic polynomial (or, equivalently, 
à = 2 is a double root of the polynomial), we say that à = 2 is an 
eigenvalue of multiplicity 2. If we want to diagonalise the matrix, we 
need to find three linearly independent eigenvectors. Any eigenvector 
corresponding to à = 4 will be linearly independent of any eigenvec- 
tors corresponding to the eigenvalue 2. What we therefore need to do 
is to find two linearly independent eigenvectors corresponding to the 
eigenvalue 2 of multiplicity 2. (Then these two vectors taken together 
with an eigenvector corresponding to à = 4 will give a linearly inde- 
pendent set.) So let’s look first at the eigenvector A = 2. We row reduce 
the matrix (A — 2/): 


LS vt i ad d 
aay (« 0 0) (0 0 o). 
EEr i 0 0 0 


We see immediately that this matrix has rank 1, so its null space (the 
eigenspace for A = 2) will have dimension 2, and we can find a basis of 
this space consisting of two linearly independent eigenvectors. Setting 
the non-leading variables equal to arbitrary parameters s and t, we find 
that the solutions of (A — 2/)x = 0 are 


1 —1 
x=s (i) + 0 = SV +tv., s,te 
0 1 


where vı and v2 are two linearly independent eigenvectors for A = 2. 


ee 


Activity 8.36 How do you know that v; and v3 are linearly independent? 


Since {v1, V2} is a linearly independent set, and since eigenvectors cor- 
responding to distinct eigenvalues are linearly independent, it follows 
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that if v3 is any eigenvector corresponding to A = 4, then {v1, V2, v3} 
will be a linearly independent set. 
We find an eigenvector for à = 4 by reducing (A — 4/). 


-1 <p. 1 1 0 -i 
(4-41) = [> -2 0 Joes (o 1 0 
i =] =l 00 0 


with solutions 


Let 


-() 


Then v1, V2, v3 form a linearly independent set of eigenvectors. If we 


take 
1 1 -l 
P= [o 1 0 ) 
1 0 1 


4 0 0 
plap=b=(0 2 0). 
0 0 2 


Activity 8.37 Check this! Check that 4P = PD and that |P| 4 0. 
Why do these two checks enable you to find any errors? 


then 


Here is another example where, this time, diagonalisation is not 
possible. 


Example 8.38 We found in Example 8.7 that the matrix, 


-3 =] -2 
HE =i i) 
1 1 0 


has an eigenvalue 4; = —1 of multiplicity 2, and a second eigenvalue, 
Az = —2. In order to diagonalise this matrix, we need two linearly 
independent eigenvectors for à = —1. To see if this is possible, we row 
reduce the matrix (A + J): 


—2 -l1 -2 1 0 1 
an= (i 0 i) == (o 1 o). 
1 1 1 0 0 0 


8.3 When is diagonalisation possible? 269 


This matrix has rank 2 and the null space (the eigenspace for A = —1) 
therefore (by the rank—nullity theorem) has dimension 1. We can only 
find one linearly independent eigenvector for A = —1. All solutions of 
(A + /)x = 0 are of the form 


—1 
=r 0); te 
1 


We conclude that this matrix cannot be diagonalised as it is not possible 
to find three linearly independent eigenvectors to form the matrix P. 


z 


8.3.4 Algebraic and geometric multiplicity 


To describe in more detail what it is that makes a matrix diagonalisable 
(and what it is that distinguishes the matrices in Example 8.35 and 
Example 8.38), we introduce the concepts of algebraic and geometric 
multiplicity of eigenvalues. 


Definition 8.39 (Algebraic multiplicity) An eigenvalue Ao of a matrix 
A has algebraic multiplicity k if k is the largest integer such that 
(A — Ao)* is a factor of the characteristic polynomial of A. 


Definition 8.40 (Geometric multiplicity) The geometric multiplicity 
of an eigenvalue 49 of a matrix A is the dimension of the eigenspace 
of ào (that is, the dimension of the null space, N(A—Ao/), of 
A — dol). 


If A is ann x n matrix with an eigenvalue 4, then we know that there 
is at least one eigenvector corresponding to à. Why? Since we know 
that |4 — àI | = 0, we know that (A — àT )v = 0 has a non-trivial solu- 
tion v, which is an eigenvector corresponding to à. So the eigenspace 
of any eigenvalue has dimension at least 1, and hence the geometric 
multiplicity, dim(N (4 — A/)), is at least 1. 

In Example 8.31 we have an eigenvalue (namely, à = —1) of alge- 
braic multiplicity 2, but because the eigenspace only has dimension 
one, there does not exist two linearly independent eigenvectors. Here, 
the fact that the geometric multiplicity is less than the algebraic mul- 
tiplicitty means that the matrix cannot be diagonalised. For, it turns 
out that if we are to find enough linearly independent eigenvectors to 
diagonalise a matrix, then, for each eigenvalue, the algebraic and geo- 
metric multiplicities must be equal. We will prove this. First, though, 
we have a straightforward relationship between algebraic and geometric 
multiplicity, which has been alluded to in the above examples. 
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Theorem 8.41 For any eigenvalue of a square matrix, the geometric 
multiplicity is no more than the algebraic multiplicity. 


Proof. Let’s suppose that u is an eigenvalue of the n x n matrix 
A and that u has geometric multiplicity k. Then there is a lin- 


early independent set {v1, V2, ..., V} of eigenvectors of A corre- 
sponding to u. By Theorem 6.45, we can extend this to a basis 
B = {Vj, Vo, .--5 Vk, Vet, +++ Vn} of R”. 


Let T be the linear transformation given by multiplication by 4; 
that is, T(x) = Ax. We now apply Theorem 7.37. According to this 
theorem, the matrix M representing T with respect to the basis B is 
P-'AP, where the columns of P are the vectors of the basis B. But, by 
Theorem 7.36, columni of M is equal to [Av;]g. So, since Av; = Lv; 
fori = 1,2,...,k, we must have 


noo. 0 
Ouo. 0 
0 0 ae O 
M=P'AP= E =. es E 
0 0 0 u 
a matrix in which, fori = 1,2,..., k, column i has u in position i 


and 0 elsewhere. (So, the top-left k x k submatrix is u times the k x k 
identity matrix.) Now, it follows that the characteristic polynomial of 
M will be 


kA p 0 0 
O @=% 0 0 
0 0 7 0 
\M—AI| = l 
0 0 0 w= 
= (u — àq (à), 


where q(A) is the determinant of the bottom-right (n — k) x (n — k) 
submatrix of M — ÀI. So (à — u)* divides the characteristic polynomial 
of M, which, as we saw earlier (Theorem 8.29), is the same as the 
characteristic polynomial of A. So the algebraic multiplicity of m is at 
least k, the geometric multiplicity. 


The following theorem provides a characterisation of diagonalisable 
matrices in terms of algebraic and geometric multiplicities. The proof 
might look daunting, but its key ideas are not so hard. 
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Theorem 8.42 4 matrix is diagonalisable if and only if all its eigen- 
values are real numbers and, for each eigenvalue, the geometric multi- 
plicity equals the algebraic multiplicity. 


Proof. We have already noted earlier that if a matrix is to be diagonal- 
isable, then all its eigenvalues must be real numbers. Suppose A is an 
n x n matrix with real eigenvalues, and denote the distinct eigenvalues 
by ài, ..., àp. Thenr < n and the characteristic polynomial of A takes 
the form 


PO) = |A— All = (D'A ADA AaB AAD), 


where k; is the algebraic multiplicity of A;. But p(A) is of degree n and 
hence n = ki tko +. +k. 

To be diagonalisable, there must be a basis consisting of n eigen- 
vectors of A. We know that if m; is the geometric multiplicity of i;, 
then m; < ki. Suppose that m; < k; for some j. Then there will not be 
a linearly independent set of k; eigenvectors corresponding to A;. But 
that means there cannot be a set of n linearly independent eigenvectors 
of A. To see why, we note that in any set S of linearly independent 
eigenvectors, each eigenvector must correspond to some eigenvalue A; 
and, by the definition of geometric multiplicity, no more than m; of 
these can correspond to 4;, for each i. So the maximum number of 
vectors in the set S is 


My +m +: +My +: +mM,. 
But since m; < k; for alli, and m; < kj, we have 
my +m +: -+m +e +m, <khy thet: +k, =n. 


So, S contains fewer than vectors, and A will not be diagonalisable. 

The argument so far shows that A will be diagonalisable only if it all 
its eigenvalues are real numbers and, for each eigenvalue, the geometric 
multiplicity equals the algebraic multiplicity. We now need to show the 
converse. 

Suppose, then, that ÆA has only real eigenvalues and that, for each, the 
algebraic and geometric multiplicities are equal. Suppose the eigenval- 
ues are A,,A2,..., A, and that, for each i, the multiplicity (algebraic and 
geometric) of A; is m;. Let S; = (vi, vË, ...,V} be a linearly inde- 
pendent set of eigenvectors for à;. We know such a set exists because 
the geometric multiplicity is m;. Then the set S = S1 US, U---US, 
(the union of the sets S;) is a set of eigenvectors for A and we will show 
that it is linearly independent, which will imply that A is diagonalisable. 
So, suppose some linear combination of the vectors in S is 0. We can 
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write this as 


GVO 4 pay 4... $a PV 4... + aM — 0. 


m, my, 
For each i, let 


wÒ = ay 4 ay +... 4 aO 


mi ' mi’ 


Then this equation can be written as 
Ww + w+... 4+w = 0, (*) 


Now, for any 7, w is a linear combination of eigenvectors correspond- 
ing to 4;, so it is either 0 or is itself an eigenvector (since it belongs to the 
eigenspace). However, if any of the w is not 0, then equation (+) shows 
that a non-trivial linear combination of eigenvectors corresponding to 
distinct eigenvalues is 0, and this is not possible since, by Theorem 8.33, 
eigenvectors for distinct eigenvalues are linearly independent. It follows 
that, for all i, w® = 0. Therefore, 


ay + ay) a ay) = 0. 
But the set S; = tv) ; vË, ines vi} is linearly independent, so it follows 
that 
aP =o =... =a 0, 


So all the coefficients a") are 0. This shows that the set S is linearly 
independent. 


8.4 Learning outcomes 


You should now be able to: 


e state what is meant by the characteristic polynomial and the charac- 
teristic equation of a matrix 

e state carefully what is meant by eigenvectors and eigenvalues, and 
by diagonalisation 

e find eigenvalues and corresponding eigenvectors for a square matrix 

e state what is meant by the eigenspace of an eigenvector 

e know how the eigenvalues are related to the determinant and trace 
of a matrix 

e diagonalise a diagonalisable matrix 

e determine whether or not a matrix can be diagonalised 
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e recognise what diagonalisation does in terms of change of basis and 
matrix representation of linear transformations (similarity) 

e use diagonalisation to describe the geometric effect of a linear 
transformation 

e know how to characterise diagionalisability in terms of the algebraic 
and geometric multiplicities of eigenvalues. 


8.5 Comments on activities 


Activity 8.6 The eigenvectors for à = 0 are the non-zero solutions of 
Ax = 0. To find these, row reduce the coefficient matrix A. 


4 0 4 1 0 1 
ę 4 = (0 1 1). 
4 4 8 0 0 0 


The solutions are 
—1 
x = t {= ; t E€ R, 
1 


so that the eigenvectors are non-zero multiples of v2 = (—1, —1, 1)'. 
The eigenspace of à = 0 is the null space of the matrix A. Note that 
AV? = 0v2 =0. 

Similarly, you should find that for à = 12, the eigenvectors are 
non-zero multiples of 


Activity 8.10 Since Ax = àx <> Ax-—ìx=(A—ìI)x = 0, the 
two sets contain precisely the same vectors. 


Activity 8.19 You should notice that the columns of 4P are v, and 2v2, 
where v1, v2 are the columns of P. 


Activity 8.24 Perform the matrix multiplication to show that 


AP = (4v, 0v2 12v3) = PD. 
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Activity 8.25 Since |P| = 6 +40, P is invertible. Using the adjoint 
method (or row reduction), obtain 


if 3 2 
pote] (2 29 2). 
6\i 1 2 


Check that P P7! = J. You have calculated AP in the previous activity, 
so now just multiply P7!AP to obtain D. 


Activity 8.26 Since v; is the ith vector in the basis B, writing T(v;) = 
AiV; expresses it as a linear combination of the basis vectors of B, so 
the B coordinates are precisely as stated: 4; in the ith position and 0 
elsewhere. 


Activity 8.27 T; is a stretch by a factor 4 in the direction of the vector 
vı = (—1, 1, 0)', a stretch by a factor of 12 in the direction of v3 = 
(1, 1, 2)" and it maps the line x = tv> to 0. 


Activity 8.36 This is immediately obvious since setting sv; + tv2 = 0, 
the second components tell us s = 0 and the third components that 
t = 0. However, this was a good time to recall that the method of 
solution ensures that the vectors will be linearly independent; see the 
discussion at the end of Section 6.5.2. 


Activity 8.37 If you know that 4P = PD, then you know that the 
eigenvectors are correct and the eigenvalues are in the correct positions 
in D. If you also check that |P| 4 0, then you know that you have 
chosen three linearly independent eigenvectors, so P~! exists and then 
P—!AP = D. If any of the checks fail, then you should be able to find 
any errors in your choice of eigenvectors and eigenvalues. 


8.6 Exercises 
Exercise 8.1 Diagonalise the matrix 


4 5 
a=(f B 


that is, find an invertible matrix P and a diagonal matrix D such that 
P—!AP = D. Check your answer. 
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Exercise 8.2 Find the eigenvalues of the matrix 


0 2 1 
B= 16 4 —6 
—16 4 10 


and find an eigenvector for each eigenvalue. Hence find an invertible 
matrix P and a diagonal matrix D such that P~'!B P = D. Check your 
work. 


Exercise 8.3 Determine if either of the following matrices can be 
diagonalised: 


1 1 1 1l 
A=(4 i B=(; i) 
Exercise 8.4 Let M be ann x n matrix. State precisely what is meant 


by the statement 
‘à is an eigenvalue of M with corresponding eigenvector v.’ 


Exercise 8.5 Let 


6 13 —8 1 
a=(2 5 -2), v= (0). 
7 17 —9 1 


Using the definition of eigenvector, show that v is an eigenvector of A 
and find its corresponding eigenvalue. 

The matrix A defines a linear transformation T : R? —> R? by 
T(x) = Ax. It is known that T fixes a non-zero vector x, T(x) = x. Use 
this information to determine another eigenvector and eigenvalue of A. 
Check your result. 

Diagonalise the matrix A: write down an invertible matrix P and a 
diagonal matrix D such that P~'AP = D. 

Describe the linear transformation T. 


Exercise 8.6 Show that the vector x is an eigenvector of A, where: 


-1 1 2 1 
TE z s). = (1). 
0 1 1 1 


What is the corresponding eigenvalue? 

Find the other eigenvalues of A, and an eigenvector for each of 
them. Find an invertible matrix P and a diagonal matrix D such that 
P-'AP = D. Check that AP = PD. 
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Exercise 8.7 Diagonalise the matrix A: 


0 0 -2 
a-(1 2 i). 
1 0 3 


Describe the eigenspace of each eigenvalue. 
Exercise 8.8 Prove the following statement: 
0 is an eigenvalue of A if and only if Ax = 0 has a non-trivial solution. 


Exercise 8.9 Look again at Exercise 6.7. Repeating what you did there, 
show that two eigenvectors corresponding to distinct eigenvalues are 
linearly independent. 

Using an inductive argument, prove that eigenvectors corresponding 
to distinct eigenvalues are linearly independent; that is, give another 
proof of Theorem 8.33. 


Exercise 8.10 Suppose that A is a real diagonalisable matrix and that 
all the eigenvalues of A are non-negative. Prove that there is a matrix 
B such that B? = A. 


8.7 Problems 


Problem 8.1 Determine which, if any, of the following vectors are 
eigenvectors for the given matrix A: 


1 1 5 1 1 0 
r= (-1), i z= (0); a=: 4 s). 
3 3 1 0 3 1 


Problem 8.2 Find the eigenvalues and corresponding eigenvectors for 


the matrix 
1 4 
A=(, ar 


Hence, find an invertible matrix P such that P~' AP is diagonal. Cal- 
culate P~' AP to check your answer. 


Problem 8.3 Diagonalise the matrix 


Describe (geometrically) the linear transformation T : R? — R? given 
by T(x) = Ax. 


8.7 Problems 277 


Problem 8.4 Find the characteristic equation of the matrix B, 


3- led 
s-s —3 I 
1 -l 2 


Find the eigenvalues (which are integers) and corresponding eigenvec- 
tors for B. 

Find a basis of R? consisting of eigenvectors of the matrix B. 

Find an invertible matrix P and a diagonal matrix D such that 
P~'BP = D, Check your answer for P by showing that BP = PD. 
Then calculate P7! and check that P~'BP = D. 


Problem 8.5 Diagonalise the matrix 


5 0 4 
=i —1 2). 
2 0 3 


Problem 8.6 Explain why the matrix 


5 0 4 
c= (a —l P). 
2 0 3 


can be diagonalised for any values ofa, b € R. 


Problem 8.7 Find the eigenvalues of the matrices 


1 1 1 —2 1 -2 
a=(0 1 -1] and =(= 0 1 ) 
1 0 2 Qe a do “2 


and show that neither matrix can be diagonalised over the real numbers. 


Problem 8.8 Consider the matrix 4 and the vector v: 


—5 8 32 2 
a=(2 8), v= (-2), 
—2 2 Ii 1 


Show that v is an eigenvector of A and find the corresponding eigen- 
value. Find all the eigenvectors corresponding to this eigenvalue, and 
hence describe (geometrically) the eigenspace. 

Diagonalise the matrix A. 


Problem 8.9 Let the matrix 4 and the vector vı be as follows: 


4 3 -7 1 
(1 2 a w= (2). 
2 2 -3 1 
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(a) Show that vı is an eigenvector of A and find its corresponding 
eigenvalue. 

Diagonalise the matrix A; that is, find an invertible matrix P and 
a diagonal matrix D such that P~!AP = D. Check your answer 
without finding P71. 

(b) Deduce the value of |4| from the eigenvalues, and show that A is 
invertible. 

Indicate how to diagonalise A7! without any further calcula- 
tions. (Find its matrix of eigenvectors and corresponding diagonal 
matrix.) 

(c) Find the missing entries s12 and s31 of A7!: 


1 —8 S12 17 
OS 5 2 AAT 


S31 —2 5 


Then verify that A~'v = Av for each of the eigenvalues and eigen- 
vectors of A~! found in part (b). 


Problem 8.10 Show that one of the following two matrices can be 
diagonalised and the other cannot: 


2 3 0 2 3 20 
a=(3 2 o). s=(3 2 o). 
1 1 5 1 —1 5 


Diagonalise the appropriate matrix. 


Problem 8.11 Suppose that you would like to find a linear transfor- 
mation T : R? —> R? which is a stretch by factor of two in the direc- 
tion vı = (1, 0, 1)', which fixes every point on the line x = tv, where 
v2 = (1, 1, 0)!, and which maps the line x = tv3, where v3 = (2, 1, 1)', 
to 0. 

Show that no such linear transformation can exist. 


Problem 8.12 Suppose that A and B are diagonalisable n x n matrices 
with the same eigenvalues. Prove that A and B are similar matrices. 


Problem 8.13 Diagonalise each of the following matrices A and B, 
5 4 —5 24 
4=(2, ae B=(> ai 
Show that 4 and B are similar by finding an invertible matrix S such 
that B = ST! AS. Check your result by multiplying S -1 4S to obtain B. 


Problem 8.14 If Aisann x n matrix, show that the matrices A and AT 
have the same characteristic polynomial. Deduce that A and AT have 
the same eigenvalues. 


9 


Applications of 
diagonalisation 


We will now look at some applications of diagonalisation. We apply 
diagonalisation to find powers of diagonalisable matrices. We also solve 
systems of simultaneous linear difference equations. In particular, we 
look at the important topic of Markov chains. We also look at systems of 
differential equations. (Do not worry if you are unfamiliar with differ- 
ence or differential equations. The key ideas yov’ll need are discussed.) 
We will see that the diagonalisation process makes the solution of linear 
systems of difference and differential equations possible by essentially 
changing basis to one in which the problem is readily solvable, namely 


a basis of 
system. 


R” consisting of eigenvectors of the matrix describing the 


9.1 Powers of matrices 


For a positive integer n, the nth power of a matrix A is simply 


A" =AAA.--A. 
-_s- 


——/ 
n times 


Example 9.1 Consider the matrix 


7 —15 
a=(5 4) 


(which we met in Example 8.3). We have 


A? 


7 15577 15 19 —45 
=44=(5 Bec ee tah 


2 oA PG -14 14 —34 


A= A.AA= A.A = (5 mae TA ae 
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It is often useful, as we shall see in this chapter, to determine A” for a 
general integer n. As you can see from Example 9.1, we could calculate 
A” by performing n — 1 matrix multiplications. But it would be more 
satisfying (and easier) to have a ‘formula’ for the nth power, a matrix 
expression involving n into which one could substitute any desired 
value of n. Diagonalisation helps here. If we can write P~!4P = D, 
then A = PDP~' and so 


A"=AAA.-.-A 
=—=___——" 
n times 
=(PDP~')(PDP™")(PDP"!)---(PDP™) 
SS ss 
n times 
= PD(P-'P)D(P7!P)D(P7'P)--.D(P7!P)DP™! 
= PDIDIDI.--DIDP 
=PDDD---DP™' 
n times 


= PD” P~. 


The product P D” P7! is easy to compute since D” is simply the diag- 
onal matrix with entries equal to the nth power of those of D. 


Activity 9.2 Convince yourself that if 


à O >. 0 a 0 > 0 
O àù 0 O Ws 0 
D=]. ro he then D’ =|. a ; 
0 0 ss Ag 0 O > 4M 


Let’s look at an easy example that builds on some work we did in the 
previous chapter. 


Example 9.3 As mentioned in the previous chapter, the matrix 
from Example 8.3 is diagonalisable: if 


5 3 
E o i): 
then P is invertible and 


1 0 
= | — 
P daiala ae 
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Suppose we want to find an expression for A”. Well, since P~' AP = D, 
we have A = PDP! and so, as explained above, 


s acted fs SVC) O74 S 
a aan = 0o 279 <8 


7 = +6(2") 15— ren 
~\_242(2") 6—52") 


You can see that the cases n = 2, 3 in Example 9.1 clearly comply with 
this general formula for the nth power. 


Here is another fairly easy example, which we will fully work through. 


Example 9.4 Suppose that we want a matrix expression for the nth 
power of the matrix 
1 4 
A = ( 1 ) . 
z 0 


The characteristic polynomial |4 — AZ| is (check this!) 
WA-A-2=(A—-2)A4)). 
So the eigenvalues are —1 and 2. An eigenvector for —1 is a solution of 
5 1 
-1 4 1 —4 
A-21=( 4)> 3 F Js 


(A + J)v = 0, found by 
A+t=(j > (a Ae 
5 0 0 
so we may take (2, —1)!. Eigenvectors for 2 are given by 
so we may take (4, 1)!. Let P be the matrix whose columns are these 
eigenvectors. Then 


The inverse is 


1/1 =A 
-1 __ 
j =i ae 


We have P~!AP = D = diag(—1, 2). The nth power of the matrix A 


is given by 
D 
(A D »)G 2) 
( 


_ 1 /2(-1)" +42") -8(-1)" + o 
6K OD +2? 41)" +22") 


Activity 9.5 Check the calculations in the examples just given. 
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9.2 Systems of difference equations 
9.2.1 Introduction to difference equations 


A difference equation is an equation linking the terms of a sequence to 
previous terms. For example, x;+1 = 5x, — 1 is a first-order difference 
equation for the sequence x+. (It is said to be first-order because the 
relationship expressing x;;; involves only the previous term.) If you 
have a first-order difference equation, once you know the first term 
of the sequence, the relationship determines all the other terms of the 
sequence. Difference equations are also often referred to as recurrence 
equations. Here t is always a non-negative integer: t € Z, t > 0. 

By a solution of a difference equation, we mean an expression for 
the term x; which involves ¢ and the first term of the sequence (the initial 
condition). One very simple result we will need is that the solution to 
the difference equation 


Xt+1 = aX; 
is simply 
t 
Xt = 4 X0, 
where xo is the first term of the sequence. (We assume that the members 
of the sequence are labeled as xo, x1, X2,..., rather than x1, x2,....) 
You might recognise these as the terms of a geometric progression, if 


you have studied those before. 
This result is easily established. If x, = axı, we have 


X1 = axo 
x2 = axı = a(axo) = a? xo 


X3 = aX) = a(a?°xo) =a" x9 


Xt = axo. 


9.2.2 Systems of difference equations 


We shall now see how we can use diagonalisation to solve (linear) sys- 
tems of difference equations. This is a powerful and important applica- 
tion of diagonalisation. We introduce the ideas with a fairly manageable 
example. 
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Example 9.6 Suppose the sequences x; and y; are related as follows: 
xo = 1, yo = 1, and, fort > 0, 
X41 = 7x; — 15y;, (9.1) 
Ver = 2x, — 4y. (9.2) 


This is an example of a coupled system of difference equations. We 
cannot directly solve equation (9.1) for x; since we would need to know 
yı. On the other hand, we can’t work out y, directly from equation (9.2) 
because to do so we would need to know x,! You might think that 
it therefore seems impossible. However, there is a way to solve the 
problem, and it uses diagonalisation. 


Example 9.6 continued Let us notice that the system we’re considering 
can be expressed in matrix form. If we let 


w= (5) 
oS Yı ’ 
then the problem is to find x, given that x,;,; = Ax, fort > 0 and given 


that xo = (1, 1)", where A (our old friend from the previous chapter 
and Example 9.1) is 


We’re very familiar with the matrix A. We know how to diagonalise it 
and we know its mth power from Example 9.3. The expression we have 
for the nth power is immediately useful here. We have x;,; = Ax; for 
t > 0. So, 

xX; = AXo, 
= Ax; = A(Axo) = A’Xo, 
X3 = AX) = A(A?Xx0) = A?Xo 


x 


N 


and, in general, 
x, = A'Xo. 
But we know (from Example 9.3) an expression for A’. So we can see 


that 
Giaxe dam (Sh 505 “So san) (7) 


_ (10-9(2") 
~ \ 4—3@') ]° 
Therefore, the sequences are 


x,=10—-9(2'), y,=4-3(2'). 
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This example demonstrates a very general approach to solving systems 
of difference equations where the underlying matrix A is diagonalisable. 

Note that we could equally well express the system x,4; = Ax, for 
t > 0, as x; = Ax,_; fort > 1. The systems are exactly the same. 


9.2.3 Solving using matrix powers 


Suppose we want to solve a system x;1,; = Ax;, in which 4 is diago- 
nalisable. As we have seen, we can use diagonalisation to determine the 
powers of the matrix and, as indicated above, this can help us to solve 
the system. The key ideas are encapsulated in Example 9.6. We now 
illustrate further with an example involving three sequences, in which 
the underlying matrix is therefore a 3 x 3 matrix. 


Example 9.7 The system we consider is as follows. We want to find 
the sequences x+, yr, Zt, which satisfy the difference equations 


X41] = 6x; + 13y: — 8z; 
Vier = 2x, + 5y; — 22; 
Zt+1 = Txi + 17y: = 9z; 


and the initial conditions x» = 1, yọ = 1, zp = 0. 
In matrix form, this system is X;+1 = Ax;, where 


6 13 <3 x; 
a=(2 5 -2), «= (3). 
7 17 -9 z 


We need to diagonalise A. You will probably have worked through this 
diagonalisation yourself in Exercise 8.5. If we take 


1 -1 1 —2 0 0 
p=(0 1 i) and b= (0 1 o). 
1 1 2 0 0 3 
then PAP = D. Now, as you can calculate, 
—-] -3 2 
P! = (- -1 1 
1 2 —l1 


It follows that 4t = PD‘ P™!, so that 


x, = A'xp = PD'P™'xp. 
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Therefore, the solution is given by 


Et Oe 0s KON Pe es a 1 
v= (0 1 1) ( 0 r o) (=i = JG) 
1 1 2 0 0 3! 1 2 aa \0 


You can multiply these matrices in any order, but the simplest way is to 
begin at the right with 


= (a: ee 1 =A 
pw |- i, i ) (1) = (=) 
i. <>. deaf 0 3 
so that 


x; 1 -1 1\ /-4(-2% —4(—2) + 2 +33") 
(:) = (0 i 1) ( e \={ 34369 | 
z, 1 ı 2 363") —4(-9)' — 2468) 


The sequence are 


x, = —4(—2)' + 2 + 33‘) 
yı = —2 + 33") 
zı = —4(—2)' — 2 + 6(3^). 


How can we check that this solution is correct? We should at least check 
that it gives us the correct initial conditions by substituting ¢ = 0 into 
the solution. This is easily done. 


Activity 9.8 Do this! 


We can also find x; in two different ways. The original equations will 


give us 
6 13 -8 1 19 
u= aia = (2 5 =2] aa 
7 17 —9 0 24 


If we get the same result from our solution, then we can be fairly certain 
that our solution is correct. According to our solution, 


x —4(—2) + 2 + 3(3) 19 
ZI —4(—2) — 2 + 6(3) 24 


So we do indeed obtain the same answer. 


Activity 9.9 Carry out any omitted calculations in this example. 
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9.2.4 Solving by change of variable 


We can use diagonalisation as the key to another general method for 
solving systems of difference equations. Given a system x;,; = AX,, in 
which A is diagonalisable, we perform a change of variable or change 
of coordinates, as follows. Suppose that P~'4P = D (where D is 
diagonal) and let 


X; = Pu;. 


Equivalently, the new variable vector u; is u; = P~'x;. One way of 
thinking about this is that the vector x; is in standard coordinates and 
u; is in coordinates in the basis of eigenvectors. Then substituting 
x, = Pu, into the equation x;,; = Ax;, and noting that x,4,; = Puj;+1, 
the equation becomes 


Pus = APu, 
which means that 
Uy+1 = P™!APu; = Du. 


Since D is diagonal, this is very easy to solve for u;. To find x;, we then 
use the fact that x, = Pu,. 
We will illustrate the method using the system in Example 9.7 


Example 9.10 We find the sequences x+, yr, ze which satisfy the differ- 
ence equations 


Xt+1 = 6x; + 13); = 8z; 
Yer = 2x; + Sy, — 22; 
Zt+1 = Txi + 17y: = 9z; 


and the initial conditions x» = 1, yọ = 1, zp = 0. 
Using the matrices A, P and D given in Example 9.7, we let 


Ut 
u= | v 
Wr 


be given by x, = Pu,. Then the equation X;+ı = Ax, gives rise (as 
explained above) to u;+ı = Duz. That is, 


Ut+1 —2 0 0 ut 
Wit 0 0 3/ \w, 


` 
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so we have the following system for the new sequences u;, v;, W;: 


Ut+1 = —2u; 
Ur+1 = Ut 
War = 3Uy;. 


This is very easy to solve: each equation involves only one sequence, 
so we have uncoupled the equations. We have, for all t, 


t t 
us = (—2)' uo, V, = V0, w, =3' wo. 


We have not yet solved the original problem, however, since we need to 
find x;, y;, Z}. We have 


Xi 1 -1 1 U 
v= (x) =r0=(c 1 1) (* 
Xi 1 1 2 w 
1 -1 1 (—2) uo 
= [o 1 7 vo i 
Le) le - 2 3‘ wo 


But we have also to find out what uo, vo, wo are. These are not given in 
the problem, but xo, yo, Zo are, and we know that 


Xo uo 1 -1 1 uo 
Zo Wo 1 1 2 Wo 


To find wo, vo, Wo, We can either solve the linear system 


R 


using row operations, or we can (though it may involve more work) find 
out what P7! is and use the fact that uy = P~!xo, 


ug Xo 1 
Wo Zo 0 


Either way (and the working is omitted here, but you should check it), 


we find 
uo —4 
Wo 3 
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Returning then to the general solution to the system, we obtain 


Xt 1 -1 1 (—2)‘uo 
anaes 
Xt 1 1 2 3‘ wo 
1 -1 1\ /-4(-2) 
-(0 1) ( - ) 
1 1 2 3(3') 


so we have the solution 
Xt —4(—2)' + 2 + 3(3') 
@ : | 34389 i 
xt —4(—2)' — 2 + 6(3') 


And, of course, this is in agreement with the answer obtained earlier 
using matrix powers. 


Activity 9.11 Perform all the omitted calculations for this example. 


9.2.5 Another example 


Let’s find the sequences x;, yt, zt which satisfy the following system of 
linear difference equations 


Xt+1 = 4x, + 4z, 


Yt+1 = Ay, + 4z; 
Zr41 = 4x, + 4, + 82; 


and xo = 6, Yo = 12, Z0 = 12. 

We will do this by both methods described above (although, of 
course, you would only need to choose one of the methods and solve it 
that way). In matrix form, this system is x,;; = AXx,, where 


40 4 x; 
a=(0 4 +). x= (3). 
4 4 8 Zi 


This is the matrix we diagonalised in Example 8.23. There, we found 
that P~' AP = D, where 


-1 -1 1 4 0 0 
p= (1 —1 i); p=(o 0 o). 
0 I 2 0 0 12 


First, we use matrix powers. Since xX;+; = Ax;, we therefore have 


x; = Á Xo. 


9.2 Systems of difference equations 289 


Because A = PDP™!, we have 4‘ = PD'P™!, so 
x, = P D' Px. 


Now, as you can calculate, 


Therefore, the solution is given by 


-1 -1 1\ /4 0 0\,/-3 3 0\/6 
wo (1 iq 1) (0 o o) (2 _2 2 (12). 
o 1 2/\o o Tey OMA 1 2/2 


Now (and this is a fact you may not have seen before) 0’ = 0 for all 
t > 1, but 0° = 1 (since by definition x? = 1 for all real numbers x). 
So for all t > 1, we have 


xy ot esl Oa 4 0 0 3 
z=) -3 -i 1) (¢ o 0} (2): 
Zt 0 1 2 0 0 12! 7 


That is, for t > 1, 
Xt —3(4) + 7(12') 
(>) = | 3(4") + 712°) , 
Zi 14(12') 


and, of course (since this is given), 


6 
X0 = (2) . 
12 


(You might observe that if you take the expression for x, given fort > 1, 
and if you substitute ¢ = 0, you don’t get the right value for x9. That 
isn’t because the solution is wrong; it’s simply because that expression 
only works for £ > 1. Look at how it is obtained: we set 0‘ equal to 0, 
something that is true for t > 1 but not for t = 0.) 

For the second method (in which we change the variable), let 


Ur 
uy =| v 
Wt 


be given by x; = Pu;. Then the equation x;;; = Ax; becomes Pu;,; = 
APu,, or uj; = P7~'!APu,; that is, u1 = Duy, 


Ut+1 4 0 0 Uur 
v41 }={10 0 0 v |. 
Watt 0 0 12/ \w, 


` 
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So we have the following system for the new sequences uz, vr, wr: 


Uti =4U; 
vi = OY 
wy) = 12u;, 


with solutions 
u, = 4ugo, v,=Ofort>1, w,=12' wo. 


To find wo, vo, Wo, We use Up = P~!xo. As calculated earlier, 


ae 1/3 3 0\ /6 3 
woh Chet a OF 7 


Then the solution is, 


x, aj eats A 4! ug 
eta ee 
x, 0 1 2) \12'w 
ah =1 1\ /3(4 
=( Ie eet 1) ( 0 
0 1 2/ \7(2") 


—3(4') + 7(12') 
= | 3(4") + 7(12') ; fort > 1 
14(12") 


and xo = (6, 12, 12)". This is in agreement with the answer obtained 
earlier using matrix powers. 


Activity 9.12 Check all the calculations in this section. 


Activity 9.13 Check the solutions by finding xı. See what happens if 
you keep 0° as part of your solution in either method; will the solution 
then work for all t > 0? 


9.2.6 Markov Chains 


To illustrate just what a Markov chain is, let’s begin by looking at an 
example. 


Example 9.14 Suppose two supermarkets compete for customers in 
a region with 20000 shoppers. Assume that no shopper goes to both 
supermarkets in any week, and that the table below gives the proba- 
bilities that a shopper will change from one supermarket (or none) to 
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another (or none) during the week. 


From A From B From none 
To A 0.70 0.15 0.30 
To B 0.20 0.80 0.20 
To none 0.10 0.05 0.50 


For example, the second column tells us that during any given week 
supermarket B will keep 80% of its customers while losing 15% to 
supermarket A and 5% to no supermarket. Notice that the probabilities 
in the column add up to 1, since every shopper has to end up somewhere 
in the following week. 

Suppose that at the end of a certain week (call it week zero), it is 
known that the total population of T = 20 000 shoppers was distributed 
as follows: 10 000 (that is, 0.5 T) went to supermarket A, 8 000 (0.4 T) 
went to supermarket B and 2 000 (0.1 7) did not go to a supermarket. 

Given this information, the questions we wish to answer are: ‘Can 
we predict the number of shoppers at each supermarket in any future 
week ¢?’, and ‘Can we predict a long-term distribution of shoppers?’ 

In order to answer these questions, we formulate the problem as 
a system of linear difference equations. Let x, denote the (decimal) 
percentage of total shoppers going to supermarket A in week t, y; the 
percentage going to supermarket B and z; the percentage who do not go 
to any supermarket. The numbers of shoppers in week ź can be predicted 
by this model from the numbers in the previous week; that is, 


X; = AX;_1, 


0.70 0.15 0.30 Xt 
A= (020 0.80 020) p= (=) 


0.10 0.05 0.50 Zi 


where 


and xo = 0.5, yo = 0.4, Zo = 0.1. 


What features of this problem make it a Markov chain? In general, a 
Markov chain or a Markov process is a closed system consisting of a 
fixed total population which is distributed into n different states, and 
which changes during specific time intervals from one distribution to 
another. We assume that we know the probability that a given member 
will change from one state into another, depending on the state the 
member occupied during the previous time interval. 

These probabilities are listed in an n x n matrix A, where the (i, j) 
entry is the probability that a member of the population will change 
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from state j to state i. Such a matrix is called a transition matrix of a 
Markov chain. 


Definition 9.15 The n x n matrix A = (a;;) is a transition matrix of a 
Markov chain if it satisfies the following two properties: 


(1) The entries of A are all non-negative. 
(2) The sum of the entries in each column of A is equal to 1: 


aij tajt: +ayj = 1. 


Property (2) follows from the assumption that all members of the pop- 
ulation must be in one of the n states at any given time. (Informally, all 
those at state 7 have to be somewhere at the next observation time, so 
the sum (over all i) of the ‘transition probabilities’ of going from state 
j to state i, must equal 1.) 

The distribution vector (or state vector) for the time period t is the 
vector x;, whose ith entry is the percentage of the population in state i at 
time t. The entries of x, sum to | because all members of the population 
must be in one of the states at any time. Our first goal is to find the state 
vector for any t, and to do this we need to solve the difference equation 


xX, = AX}, t:l. 


A solution of the difference equation is an expression for the distribution 
vector x; in terms of A and Xo, and, as we have seen earlier, the solution 
is x; = A’Xo. 

Now assume that A can be diagonalised. If A has eigenvalues 
Aj, À2,..., Àn With corresponding eigenvectors vj, V2,...,V,, then 
P-'AP = D where P is the matrix of eigenvectors of A and D is 
the corresponding diagonal matrix of eigenvalues. 

The solution of the difference equation is 


xX, = Á'Xo = (PD! P7!)xo. 


Let’s examine this solution to see what it tells us. If we set x = Pz, so 
that z = P~!xo = (bj, b2, ...,b,)' represents the coordinates of Xo in 
the basis of eigenvectors, then this solution can be written in vector 
form as 


x, = PD'(P~!xo) 


aL 0 0\ /b 

|| | 0 as 0 || b 
=|(vi v Vn S : 
a | 05.0 ea AA Nba 


DALY, + b2A5V2 +++ + brà Va. 
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Activity 9.16 Make sure you understand how the final equality above 
follows from matrix multiplication properties. 


Now let’s return to our example. 


Example 9.14 continued We will use this solution to find the number 
of shoppers using each of the supermarkets at the end of week t, and 
see if we can use this information to predict the long-term distribution 
of shoppers. 

First, we diagonalise the matrix A. The characteristic equation of A 
is 


0.70—2% 0.15 0.30 
|A—AZ|=| 0.20 0.80—A 0.20 
0.10 0.05 0.50—A 


= —A43 + 2)? — 1.244 + 0.24 = 0. 


This equation is satisfied by à = 1, and hence 1 is an eigenvalue. Using 
the fact that (A — 1) is a factor of the polynomial, we find 


(A — 1)(A2 — à + 0.24) = (A — 1)(A — 0.6)(A — 0.4) = 0, 


so the eigenvalues are 4; = 1, à2 = 0.6, and A3 = 0.4. The corre- 
sponding eigenvectors v; are found by solving the homogeneous sys- 
tems (A — à; /)v = 0. (We omit the calculations.) Writing them as the 
columns of a matrix P, we find that P~!'4P = D, where 


a (2, =l 1 0 0 
p=(4 —4 0), p= (o 0.6 o): 
1 1 1 0 0 04 


Activity 9.17 Carry out the omitted calculations for the diagonalisation 
above. 


The distribution vector x, at any time ¢ is then given by 
Xx, = biy'vy + b2(0.6) v2 + b3(0.4)'v3, 


where it only remains to find the coordinates, bı, b2, b3 of xq in the 
basis of eigenvectors. 

Before we do this, let’s see what the solution tells us about a long- 
term distribution of shoppers. We want to know what happens to x; 
for very large values of t; that is, as £ —> oo. Note — and this is very 
important—that 1‘ = 1, and thatast — oo ,(0.6)' > Oand(0.4)' > 0. 
So there is a long-term distribution: the limit of x, as t —> oo isa scalar 
multiple, q = b,v,, of the eigenvector vı whose eigenvalue is 1. 
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Now we’ll complete the solution by finding b1, b2, b3. The coordi- 
nates of Xo in the basis of eigenvectors are given by 


ed We Us 0.125 a 
ar | 1 St 7 (04) = (002s) = (2). 
8 \-2 0o 6/ \01 —0.05 b; 


Hence, 


3 3 —1 
x; = 0.125 () + 0.025(0.6) (=) — 0.05(0.4)' 0 , 
1 1 1 


and the long-term distribution is 


(osoo 
q= lim x= oa : 
bg a 0.125 


Relating this to numbers of shoppers, and remembering that the total 
number of shoppers is 20000, the long-term distribution is predicted 
to be 20000q: 7500 to supermarket A, 10000 to B and 2500 to no 
supermarket. 


Activity 9.18 Verify that P7! is as stated. 


You will have noticed that an essential part of the solution of predicting 
a long-term distribution for this example is the fact that the transition 
matrix A has an eigenvalue A = 1 (of multiplicity 1), and that the other 
eigenvalues satisfy |A;| < 1. In this case, as ¢ increases, the distribution 
vector x; will approach the unique eigenvector q for 4 = 1 which is 
also a distribution vector. The fact that the entries sum to 1 makes q 
unique among the vectors satisfying Aq = q. 

We would like to be able to know that this is the case for any Markov 
chain, but there are some exceptions to this rule. A Markov chain is 
said to be regular if some power of the transition matrix A has strictly 
positive entries (so it has no zero entries). In this case, there will be a 
long-term distribution, as the following theorem implies. 


Theorem 9.19 /f A is the transition matrix of a regular Markov chain, 
then à = 1 is an eigenvalue of multiplicity 1, and all other eigenvalues 
satisfy |r; | < 1. 


We will not prove this theorem here. However, we will prove a similar, 
but weaker result, which makes it clear that the only thing that can go 
wrong is for the eigenvalue à = 1 to have multiplicity greater than 1. 
First, we need a definition. 
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Definition 9.20 A matrix C is called a stochastic matrix if it has the 
following two properties: 


(1) The entries of C are all non-negative. 
(2) The sum of the entries in each row of C is equal to 1: 


Ci + Ciz Heee + Cin = l. 


Note that if A is a transition matrix for a Markov process, then AT is 
a stochastic matrix (because all of its entries are non-negative and the 
sum of the entries of each row of A! is equal to 1). 

Matrices A and A! have the same eigenvalues, because by prop- 
erties of transpose and determinant, they have the same characteristic 
polynomials (the roots of which are the eigenvalues): 


|4 —Al| = |(A-AD"| = |47 — à]. 
We will prove the following theorem for stochastic matrices, and then 
apply it to transition matrices. 


Theorem 9.21 /fC is a stochastic matrix, then: 


e v=(l,1,...,1)' isan eigenvector of C with eigenvalue à = 1. 
«e IfX is an eigenvalue of C, then |A| < 1. 


Proof: Let C = (c;;). That Cv = v follows immediately from property 
(2) of the definition of a stochastic matrix, since the ith entry of Cv is 
call) + enll) + +++ + ¢in() = 1. 

To prove the second statement, let à be an eigenvalue of C, let 
u Æ 0 be any vector satisfying Cu = Au, and let u; denote the largest 
component (in absolute value) of u. To show that |A| < 1, set 


1 
w= —u. 
uj 
Then Cw = Aw, w; = 1, and |w;| < 1 for 1 < k < n. Consider what 
the ith row of the matrix equation Cw = Aw tells us. It says that 
dw; = Cj, W 1 + Ci? W2 tees + CinWn, 
and hence 
|A| = |Aw;| (since w; = 1) 
= |c Wy + Cizw + +++ + Cin Wy 
< ci |wil + ci2|wa| +--+ + Cin|Wnl 
< Cii Cea Aree ey St (because wg < 1). 
So we’ve shown that A = 1 is an eigenvalue and that all eigenvalues 4; 
satisfy |A;| < 1. 
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What does this theorem imply about Markov chains? We saw earlier that 
if A is the transition matrix of a Markov chain, then A! is a stochastic 
matrix and also that A and AT have the same eigenvalues. Therefore, 
you can deduce from Theorem 9.21 that: 


e A= lis an eigenvalue of A, and 
e if A, is an eigenvalue of A then |à;| < 1. 


The theorem tells us that à = 1 is an eigenvalue, but it might have 
multiplicity greater than 1, in which case either there would be more 
than one (linearly independent) eigenvector corresponding to A = 1, or 
the matrix might not be diagonalisable. 

In order to obtain a long-term distribution, we need to know that 
there is only one (linearly independent) eigenvector for the eigenvalue 
à = 1. So ifthe eigenvalue A = 1 of a transition matrix A of a Markov 
chain does have multiplicity 1, then Theorem 9.21 implies all the other 
eigenvalues A; satisfy |A;| < 1. There will be one corresponding eigen- 
vector which is also a distribution vector, and provided A can be diag- 
onalised, we will know that there is a long-term distribution. This is all 
we will need in practice. 


9.3 Linear systems of differential equations 


This section is aimed at those who will have studied calculus before, 
as many of you have (or will be doing so concurrently with your linear 
algebra studies). But if you have not yet studied calculus, you can simply 
omit this section. 

A differential equation is, broadly speaking, an equation that 
involves a function and its derivatives. We are interested here only 
in very simple types of differential equation and it is quite easy to 
summarise what you need to know so that we do not need a lengthy 
digression into calculus (which would detract from the whole point of 
the exercise, which is to demonstrate the power of diagonalisation). 

For a function y = y(t), the derivative of y will be denoted by 
y' = y(t) or dy/dt. The result we will need is the following: if y(t) 
satisfies the ‘linear’ differential equation 


y' =ay, 


then the general solution is 


y(t) = Be“ for BER. 
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If an initial condition, y(0) is given, then since y(0) = e? = B, we 
have a particular (unique) solution y(t) = y(0)e“’ to the differential 
equation. 


Activity 9.22 Check that y = 3e” is a solution of the differential equa- 
tion y’ = 2y which satisfies the initial condition y(0) = 3. 


We will look at systems consisting of these types of differential equa- 
tions. In Section 9.2.4, we used a change of variable technique based on 
diagonalisation to solve systems of difference equations. We can apply 
an analogous technique to solve systems of linear differential equations. 

In general, a (square) linear system of differential equations for the 
functions y\(¢), y2(t), ..., Yn(t) is of the form 


yi = 411 Y1 + 412V2 + `+- + ainVn 
y = A21 Y1 + A222 + ` -© + A2nYn 


Yn = Ani Yi + Anry2 + ` + + AnnYn, 


for constants a;; € R. So such a system takes the form 
y = 4y, 


where A = (a;;) is an n x n matrix whose entries are constants (that 
is, fixed numbers), and y = (y1, y2,---, Yn)”, Y! = Vi, Yh, -- -, Y)" are 
vectors of functions. 

If A is diagonal, the system y’ = Ay is easy to solve. For instance, 
suppose 


A = diag(A1, A2,..-, An). 
Then the system is precisely 
Yi EMY, Vy = Arya, -ees Yn = AnVns 
and so 
yy =yiOJe, y = ye, a., Yn = VrO. 


Since a diagonal system is so easy to solve, it would be very helpful if 
we could reduce our given system to a diagonal one, and this is precisely 
what the method will do in the case where A is diagonalisable. We will 
come back to the general discussion shortly, but for now we explore 
with a simple example. 
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Example 9.23 Suppose the functions y(t) and y(t) are related as 
follows: 


yi =7y — 15y2 
y} = 2y1 — 4y2. 


In matrix form, this is y’ = Ay, where A is the 2 x 2 matrix we 
considered earlier: 


We’ve seen this matrix is diagonalisable; if 


5 3 
= G i) 
then P is invertible and 


1 0 
=] = AL 
P AEDS, 5 | 


We now use the matrix P to define new functions z;(t), z2(t) by setting 
y = Pz (or equivalently, z = P~'y); that is, 


E-E C=" 


yı = 521 +322 


so that, 


y2 = 221 +22. 


By differentiating these equations, we can express y| and y; in terms 
of zi and z, 


yi = 5z, +32, 
y2 = 22, +25, 
so that y’ = (Pz) = Pz’. Then we have, 
Pz = y = Ay = A(Pz)= APz 
and hence 
z = P'APz = Dz. 


In other words, 
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So the system for the functions z1, Z2 is diagonal and hence is easily 
solved. Having found z1, z2, we can then find yı and yz through the 
explicit connection between the two sets of functions, namely y = Pz. 


Let us now return to the general technique. Suppose we have the sys- 
tem y’ = Ay, and that A can indeed be diagonalised. Then there is an 
invertible matrix P and a diagonal matrix D such that P~-'AP = D. 
Here 


P = (V1... Vn), D = diag(ài, à2,..., Àn) 


where À; are the eigenvalues and v; corresponding eigenvectors. Let 
z = P`!y (or, equivalently, let y = Pz). Then 


y = (Pzy = Pz, 
since P has constant entries. 
Activity 9.24 Prove that (Pzy = Pz’. 
Therefore, 
Pz = Ay = APz, 
and 
z = P APz = Dz. 


We may now easily solve for z, and hence y. 

We illustrate with an example of a 3 by 3 system of differential 
equations, solved using this method. Note carefully how we use the 
initial values yı(0), y2(0) and y3(0). 


Example 9.25 We find functions y;(t), y2(t), y3(t) such that y,(0) = 
2, y2(0) = 1 and y3(0) = 1 and such that they are related by the linear 
system of differential equations, 


dyı 

— =6 13y — 8 
Ti yı + l3y2 y3 
dyz 

ea Sy — 2 

di Vit dy2 3 
dy3 

ee 17y» — 9y3. 
A yı + lye Y3 


We can express this system in matrix form as y’ = Ay, where 


6 13 —8 
A= (z 5 -2| ; 
7 17 —9 
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As we saw earlier (in Exercise 8.5 and in Example 9.10), P™!AP = D, 


where 
1 -1 1 —2 0 0 
p=(0 1 i); p={ 0 1 o). 
1 1 2 0 0 3 


We set y = Pz, and substitute into the equation, y’ = Ay to obtain 
(Pz) = A(Pz). That is, Pz’ = APz and so z’ = P~'APz= Dz. In 
other words, if 


then 
Zi —2 0 0 Zi 
z% 0 0 3 Z3 
So, 
Zi = —2Z1, Z}=Z2, Z3 = 323 
Therefore, 


Z= z\(O)e~, z2=2(0)e', 23= z3(0)e™. 


Then, using y = Pz, we have 


yı 1 -1 1\ /z,(0)e~* 
@ = (o 1 ; z7(0)e' | 
Y3 1 1 2 z3(O)e*! 


It remains to find z;(0), 22(0), 23(0). To do so, we use the given initial 
values yı(0) = 2, y2(0) = 1, y3(0) = 1. Since y = Pz, we can see that 
y(0) = Pz(0). We could use row operations to solve this system to 
determine z(0). Alternatively, we could use z(0) = P~'y(0). Perhaps 
the first way is generally easier, but in this particular case we already 
know the inverse of P from earlier: 


—-l1 -3 2 
P! = (- -1 1 
1 2 —l1 
Therefore, 


zı(0) -1 -3 2 2 =| 
Zo = (20) = P~'y(0) = (= = 1 ) (i) = (=) f 
z3(0) 1 > | 1 3 
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—1 1 z1(O)e~~! 
1 r) | z2(0)e ) 
1 2 23(0)e*" 


1) sid —3e* 
1 | —2e! 
1 2 3e 


—3e7” + 2e + a 


Therefore, finally, 


yı 
V2 -= 
y3 


—2e + 3e” 
—3e7% — 2e! + 6e” 


The functions are 
yilt) = —3e77 + 2e! + 3e” 
y(t) = —2e + 3e” 
y3(t) = —3e7™” — 2e + 6e” . 


How can we check our solution? First of all, it should satisfy the initial 
conditions. If we substitute £ = 0 into the equations, we should obtain 
the given initial conditions. 


Activity 9.26 Check this! 


The real check is to look at the derivatives at t = 0. We can take the 
original system, y’ = Ay and use it to find y’(0), 


y,(0) 6 13 -8 yı(0) 
MEEN 
y3(0) 7 17 —9/ \y:(0) 
6 13 —8 2 17 
n) 
7 17 —9 1 22 
And we can differentiate our solution to find y’, and then substitute 
t=0: 
y(t) 6e~*! + 2e! + 9e” 
(z0) = | —2e' + 9e” ) f 
y4) 6e% — 2e! + 18e% 


Activity 9.27 Substitute £ = 0 to obtain y'(0) and check that it gives 
the same answer. 


Nn 


Often it is desirable to find a general solution to a system of differential 
equations, where no initial conditions are given. A general solution will 
have n arbitrary constants, essentially one for each function, so that 
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given different initial conditions later, different particular solutions can 
be easily obtained. We will show how this works using the system in 
Example 9.25. 


Example 9.28 Let y(t), y2(t), y3(t) be functions related by the system 
of differential equations 


Or EE E 
Ji 1 2 3 
dyz 

= Oy + 5y -— 2 
di Vit dy2 V3 
dy3 

— =7y, + 17y — 9y3. 
di yı y2 y3 


Let the matrices A, P and D be exactly as before in Example 9.25, 
so that we still have P~'AP = D, and setting y = Pz, to define new 
functions zı (t), z2(t), z3(t), we have 


y = Ay —> P7 = APz 4> 7 = P! APZ = Dz. 
So we need to solve the equations 
Zi = 22, Z% =Z, Z% = 324 
in the absence of specific initial conditions. The general solutions are 


—2t t 3t 
zi=ge , Z= ße, 23=ye, 


for arbitrary constants a, 6, y € R. 
Therefore, the general solution of the original system is 


yı 1 —1 1 ae% 
V3 1 1 2 ye” 


yi(t) = ae — pe + ye” 
yo(t) = Be’ + ye” fora, B,y € R. 
y3(t) = ae~* + Be’ + 2ye™ 


that is, 


Using the general solution, you can find particular solutions for any 
given initial conditions. For example, using the same initial conditions 
yı(0) = 2, y2(0) = 1 and y3(0) = 1 as in Example 9.25, we can substi- 
tute ¢ = 0 into the general solution to obtain, 

yi()=2=a-Bt+y 

y()=1=Bt+y 

y3(0) =l=a+fp+2y 
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and solve this linear system of equations for a, 8, y. Of course, this 
is precisely the same system y(0) = Pz(0) as before, with solution 


P~'y(0), 
q -1 -3 2\ /2 3 
ere se 
y i; 2 ai 3 


Activity 9.29 Find the particular solution of the system of differen- 
tial equations in Example 9.28 which satisfies the initial conditions 
yı(0) = 1, y2(0) = 1 and y3(0) = 0. Compare your result with the solu- 
tion of difference equations in Example 9.10 which uses the same initial 
conditions for sequences. What do you notice? Why does this happen? 


9.4 Learning outcomes 


You should now be able to: 


e calculate the general nth power of a diagonalisable matrix using 
diagonalisation 

e solve systems of difference equations in which the underlying matrix 
is diagonalisable, by using both the matrix powers method and the 
change of variable method 

e know what is meant by a Markov chain and its properties, and be 
able to find the long-term distribution 

e solve systems of differential equations in which the underlying 
matrix is diagonalisable, by using the change of variable method. 


9.5 Comments on activities 


Activity 9.2 Take any 2 x 2 diagonal matrix D. Calculate D? and D3’, 
and observe what happens. Then see how this generalises. 


Activity 9.13 We have 
4 0 4 6 72 —3(4) + 7(12) 
4 4 8 12 168 14(12) 

If you keep 0° as part of the solution, the solution is 


x —3(4) + 2(0°) + 7(12") 
(>) = | 3(4") + 2(0') + 7(12') 
x, —2(0') + 14(12") 
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Using 0’ = 0 for t > 1 and 0° = 1, this gives the correct x9 and the 
same solution for t > 1. 


Activity 9.16 First multiply the two matrices on the right to obtain 


a 0 we Wh; bat 

pes 0: as axe 0 |b baad, 
OS Wie «. s ; = |= . 

O <0 s Bey i by A 


Then express the product P(D‘(P~!xo)) as a linear combination of the 
columns of P (see Activity 4.14), 


bya 
a, | | | boat, 
P(D'(P— xo)) = | Vi V2 >> Vn . 
| | | byt 
= bài vi + b2A5V2 +++ + bn Vn. 
Activity 9.22 It is clear that y(0) = 3e° = 3. Furthermore, 
y! = 6e” = 3e”) = 2y. 


Activity 9.24 Each row ofthe n x 1 matrix Pz is a linear combination 
of the functions zı (t), z2(t), . . . , Zn(t). For example, row i of Pz is 


PiiZ1(t) + pizza(t) + +++ + PinZn(t). 


The rows of the matrix (Pzy are the derivatives of these linear combi- 
nations of functions, so the ith row is 


(pazi(t) + pi2z2(t) + +++ + Pinza OY 
= panzi) + pnz) +--+ + pinz, (0), 


using the properties of differentiation, since the entries p;; of P are 
constants. But 


Pnz (t) + Piza (t) +--+ + DinZ,(t) 
is just the ith row of the n x 1 matrix Pz’, so these matrices are equal. 


Activity 9.29 For the initial conditions y,(0) = 1, y2(0) = 1, y3(0) = 0 
the constants a, 6, y are 


ONCEDE] 
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so the solution is 
yilt) = —4e7* + 2e! + 3e” 
y(t) = —2e' + 3e” 
y3(t) = —4e7™ — 2e' + 6e”. 


Compare this with the solution of the difference equations in 
Example 9.10 on page 288 


Xt —4(-2)' + 2 + 3(3*) 
()-( —2 + 3(3°) ) 
Xe —4(—2)' — 2 + 6(3’) 

The two solutions are essentially the ‘same’, with the functions e* 
replaced by A‘. Why? The coefficient matrix A is the same for both 
systems, and so are the matrices P and D. So the change of basis 
used to solve the systems is the same. We are changing from a system 


formulated in standard coordinates to one in coordinates of the basis of 
eigenvectors of A in order to find the solution. 


9.6 Exercises 


Exercise 9.1 Given the matrix 


4 5 
a=( f 2) 


find A” for any positive integer n. 
Deduce from your result that the expression (—1)* — 3° is divisible 
by 4 for all k > 1. 


Exercise 9.2 Solve the following system of difference equations. 


Xt = X; + 4y 


1 
Yt+1 = 34t, 


given that xọ = yọ = 1000. 


Exercise 9.3 Sequences x;, yr, z; are defined by x9 = —1, yo = 2, 
Z0 = 1 and 

Xa) = 7x; — 32; 

Ver = Xi + 6, + 5z; 

Zt+1 = Sx; — Zt: 


Find formulae for x;, y;, and zz. 
Check your solution. 
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Exercise 9.4 Given that 


C C) C) 


are eigenvectors of the matrix 


1 -2 —6 
a=(2 5 6 ) 
Se S20 ss3 


find an invertible matrix P such that P~!AP is diagonal. Using the 
method of changing variables, find sequences x;, yr, z; satisfying the 


equations 
Xt+1 = Xt — 2y = 62; 


Viet = 2x, + 5yr + 62; 
Zt+1 Slap — 2y; — 32s 


and with the property that xo = yo = 1 and zp = 0. 
Find the term x5. 


Exercise 9.5 At any time f, the total population of 210 people of Desert 
Island is divided into those living by the sea (x;) and those living in the 
oasis (y+). Initially, half the population is living by the sea, and half in 
the oasis. Yearly population movements are given by 


T _ (9.6 0.2 o (xı 
X; = Ax,_; where A = a a , X = @ ; 


Show this is a Markov process and interpret the yearly population 
movements from the matrix A. 
Find expressions for x; and y; at any future time t. 


Determine the ‘long-term’ population distribution; that is, find what 
happens to x; as t > oc. 


Exercise 9.6 Consider the matrices 


0.7 0.2 0.2 Le 2 X 
a=[0 0.2 04), a=(0 2 +). x= (x) . 
0.3 0.6 0.4 3 6 4 Zi 


(a) What is the relationship between the matrices 4 and B? 

Show that Æ and B have the same eigenvectors. What is the rela- 
tionship between the corresponding eigenvalues? 

Show that the system x; = Ax;_; is a Markov chain by showing 
that the matrix A satisfies the two conditions to be the transition matrix 
of a Markov chain. 

Deduce that A = 10 is an eigenvalue of B. 
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(b) Find an eigenvector of B corresponding to the eigenvalue 
A= 10. 

Diagonalise the matrix B: Find an invertible matrix P anda diagonal 
matrix D such that P~!'B P = D. Check that BP = PD. 

Write down the eigenvalues and corresponding eigenvectors of A. 
(c) An economic model of employment of a fixed group of 1000 workers 
assumes that in any year f, an individual is either employed full-time, 
employed part-time or unemployed. Let x; denote the percentage (as 
a decimal) of full-time workers in year t, y, the percentage working 
part-time and z; the percentage who are unemployed. Then according 
to this model, the probabilities that a worker will change from one 
state to another in year ¢ are given by the matrix A above, so that 
x; = AX;,_ . Initially, 200 are employed full-time and 300 are employed 
part-time. 

Find the long-term population distribution of this system. Eventu- 
ally, what number of workers are employed, either full or part-time? 


Exercise 9.7 Suppose functions y;(t), y2(t) are related by the following 
system of differential equations: 

yi = 41 + Sy2 

yy = —y1 — 2yr. 
Find the solutions to these equations that satisfy yı(0) = 2, y2(0) = 6. 


Check your answer. 


Exercise 9.8 Find the general solution of the following system of dif- 
ferential equations: 


2 ENT E 
di yı T y2 y3 
dyz 

ae = —6y; + 2y2 + 6y3 
t 

dy3 

aa + 

di y2 T V3 


for functions y;(t), y2(t), y3(t), t € R. 


Exercise 9.9 Find functions yı(t), y2(t), y3(t) satisfying the system of 
differential equations: 

yi = 4y1 + 4y3 

Vy = 4y2 + 4y3 

y3 = 41 +42 + 893 
and with y,(0) = 6, y2(0) = 12, y3(0) = 12. 


308 Applications of diagonalisation 


Exercise 9.10 Consider 


5 -8 —4 2 
TE -5 3) v= (1). 
-1 2 2 0 


Find a basis for the null space of 4, N(A). 

Show that the vector vı is an eigenvector of A and find the corre- 
sponding eigenvalue. Find all the eigenvectors of A which correspond 
to this eigenvalue. Hence find an invertible matrix P and a diagonal 
matrix D such that P~!AP = D. 

Find A”. What do you notice about the matrix A and its powers A”? 

Find the solution to the system of difference equations given by 
X;41 = Ax, for sequences x, = (x;, Yı, z1)", t € Z, t > 0, with initial 
conditions x9 = 1, yo = 1, zọ = 1. Write down the first four terms of 
each sequence. 


9.7 Problems 


Problem 9.1 Find 4° if 4 = p J ; 


Problem 9.2 Find sequences x; and y; which satisfy the following 
system of difference equations 


Xt+1 = Xp + Ay; 


Vet = 3x + 2y; , 


and the initial conditions xọ = 1, yọ = 0. 
Find the values of x5 and ys. 


Problem 9.3 Find sequences x;, y+, z; satisfying the equations 


X41 = 4x, + 3y: — 72; 
Yiyi = Xy + 2yr +2; 
Zt+1 = 2X, + 2y; — SZ, 


and with the property that xo = 4, yp = 5 and zọ = 1. (See Problem 8.9.) 
Check your answer by calculating x1, y1, zı from both the solution 
and the original system. 


Problem 9.4 A Markov process satisfies the difference equation x; = 


AX;—ı where 
a= (07 n „(06 
T (0.3 0.4)’ o~ Gay 
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Solve the equation and find an expression for x; as a linear combination 
of the eigenvectors of A. 

Use this to predict the ‘long term’ distribution; that is, find what 
happens to x+ as k > oo. 


Problem 9.5 Consider the system of equations x; = Ax,;_; where 


1 1 
0 7 7 Xt 
1 1 
A = 2 3 0 5 X; = Yı 
1 1 
7 9 3 Z 


State the two conditions satisfied by the matrix A so that it is the 
transition matrix of a Markov process. What can you conclude about 
the eigenvalues of the matrix A? 

Find an invertible matrix P and a diagonal matrix D such that 
PAP =D. 

Assume that the system represents a total population of 6000 mem- 
bers distributed into three states, where x; is the number of members in 
state one at time ¢, y; is the number in state two, and z; is the number in 
state three. Initially 1/6 of the total population is in state one, 1/3 is in 
state two, and 1/2 is in state three. Find the long term population dis- 
tribution of this system. State clearly the expected number of members 
which will eventually be in each of the three states. 


Problem 9.6 The population of osprey eagles at a certain lake is dying 
out. Each year the new population is only 60% of the previous year’s 
population. 

(a) Conservationists introduce a new species of trout into the lake and 
find that the populations satisfy the following system of difference 
equations, where x, is the number of osprey in year ¢ and y; is the 
number of trout in year t: x; = Ax;_1, where 


Pe: ga 08. 02 _ (20 
5Sa 250.25 1.2/° *¥5 u00)" 


Give a reason why this system of difference equations is not a Markov 
process. Describe in words how each of the populations depends on 
the previous year’s populations. 

Solve the system of difference equations. Show that the situation 
is not stable: that according to this model both osprey and trout will 
increase without bound as t —> oo. What will be the eventual ratio of 
osprey to trout? 

(b) In order to have the populations of osprey and trout achieve a 
steady state, they decide to allow an amount of fishing each year, based 
on the number of osprey in the previous year. The new equations are 
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x, = Bx;_1, where 


(x pa (06 02 _ (20 
FEN iy Jato 25-a 12J° P00 


and a > 0 is a constant to be determined. 

What property of the transition matrix of a Markov process deter- 
mines that there is a (finite, non-zero) long-term distribution? Deduce 
a condition on the eigenvalues of the matrix B to produce the same 
effect. Then find the value of a which satisfies this condition. 

Show that for this value of œ the population now reaches a steady 
state as £ — oo and determine what this stable population of osprey 
and trout will be. 


Problem 9.7 Find the general solution of the following system of linear 
differential equations: 


y(t) = yilt) + 4y2(t) 
VO = 3y (A) + 2y2(t) . 


Then find the unique solution satisfying the initial conditions yı(0) = 1 
and y2(0) = 0. 
Check your solution by finding the values of y{ (0) and y/(0). 


Problem 9.8 Write the system of differential equations 
Y(t) = 3y (H) + 2y2(t) 7 ( yı ) 
y5(t) = 2yi(t) + 6y2(t) | ya)” 


in matrix form, as y’ = Ay. Find the solution which satisfies the initial 
conditions y,(0) = 5, y.(0) = 5. 


Problem 9.9 Diagonalise the matrix 


-1 3 0 
A= | 0 2 o) : 
—3 3 2 
Write out the system of linear differential equations given by y’ = Ay, 


where y = (yı (t), y2(t), y3(t))', and find the general solution. 


Problem 9.10 Show that the vectors 


1) 0) (3) 


form a basis B = {v1, v2, v3} of R°. 
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(a) Ifw = (2, —3, 1)’, find [w]z, the B coordinates of w. 
(b) Finda matrix A for which v; is an eigenvector corresponding to the 


(c) 


eigenvalue A; = 1, v2 is an eigenvector with eigenvalue àz = 2, 
and v3 is an eigenvector with eigenvalue 13 = 3. Verify that your 
matrix A satisfies these conditions. 

Find a general solution of the system of differential equations 


y = Ay 


where A is the matrix in part (b) and y = (7(f), y2(t), AT, 
t € R, is a vector of functions. 

Then find the unique solution which satisfies y(0) = w, where 
w is the vector in part (a). 


10 


Inner products and 
orthogonality 


In this chapter, we develop further some of the key geometrical ideas 
about vectors, specifically the concepts of inner products and orthog- 
onality of vectors. In Chapter 1, we saw how the inner product can be 
useful in thinking about the geometry of vectors. We now investigate 
how these concepts can be extended to a general vector space. 


10.1 Inner products 


10.1.1 The inner product of real n-vectors 


In Chapter 1, we looked at the inner product of vectors in R”. Recall 
that, for x, y € IR", the inner product (sometimes called the dot product 
or scalar product) is defined to be the number (x, y) given by 


(x,y) =x" y = x1y1 t+ x292 +++ + Xan. 

This is often referred to as the standard or Euclidean inner product. 

We re-iterate that it is important to realise that the inner product 
is just a number, not another vector or a matrix. The inner product on 
R” satisfies certain basic properties and is, as we have seen, closely 
linked with the geometric concepts of length and angle. This provides 
the background for generalising these concepts to any vector space V, 
as we shall see in the next section. 

It is easily verified (using Theorem 1.36) that for all x, y, z € R” 
and for alla, 6 € R, the inner product satisfies the following properties: 


(i) (x,y) = (y, x). 
Gi) (ax + By, z) = a(x, z) + By, z). 
(iii) (x, x) > 0, and (x, x) = 0 if and only if x = 0. 
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We have seen that the length, ||a||, of a vector a satisfies ||a||? = (a, a). 
We also noted that ifa, bare two vectors in R? and 0 isthe angle between 
them, then (a, b) = |lall ||b|| cos @. In particular, non-zero vectors a and 
b are orthogonal (or perpendicular) if and only if (a, b) = 0. 


10.1.2 Inner products more generally 


There is a more general concept of inner product than the one we met 
earlier, and this is very important. It is ‘more general’ in two ways: first, 
it enables us to say what we mean by an inner product on any vector 
space, and not just IR”, and, second, it allows the possibility of inner 
products on R” that are different from the standard one. 


Definition 10.1 (Inner product) Let V be a vector space (over the real 
numbers). An inner product on V is a mapping from (or operation on) 
pairs of vectors x, y to the real numbers, the result of which is a real 
number denoted (x, y), which satisfies the following properties: 


(i) (x,y) = (y, x) forall x,y € V. 
Gi) (œx + y, z) = «a(x, z) + (y, z) for all x,y,ze V and all 
a,BeR. 
(iii) (x, x) > 0 for all x € V, and (x, x) = 0 if and only if x = 0, the 
zero vector of the vector space. 


Some other basic facts follow immediately from this definition, for 
example 


(z, ax + By) = a(z, x) + B(z,y). 
Activity 10.2 Prove that (z, ax + By) = a(z, x) + B(z,y). 


Of course, given what we noted above, it is clear that the standard inner 
product on R” is indeed an inner product according to this more general 
definition. This new, more general, abstract definition, though, applies 
to more than just the vector space IR”, and there is some advantage in 
developing results in terms of the general notion of inner product. If a 
vector space has an inner product defined on it, we refer to it as an inner 
product space. 


Example 10.3 (This is a deliberately strange example. Its purpose is to 
illustrate how we can define inner products in non-standard ways, which 
is why we’ve chosen it.) Suppose that V is the vector space consisting 
of all real polynomial functions of degree at most n; that is, V consists 
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of all functions p : x +> p(x) of the form 


p(x) = ao +aix baw +--+ anx", ao, 41,..., an E R. 


The addition and scalar multiplication are, as usual, defined pointwise. 
(Recall that this means that (p + q)(x) = p(x) + q(x) and (ap)(x) = 
ap(x).) Let x1, x2,...,X,41 ben + 1 fixed, different, real numbers, and 
define, for p,q € V, 


n+l 


(P, q) = 5 P(xi)q (xi). 


i=l 


Then this is an inner product. To see this, we check the properties in the 
definition of an inner product. Property (i) is clear. For (iii), we have 


n+l 


(p. p) = >> p(x) = 0. 


i=l 


Clearly, if p is the zero vector of the vector space (which is the 
identically-zero function), then (p, p) = 0. To finish verifying (iii), we 
need to check that if (p, p) = 0, then p must be the zero function. Now, 
(p, p) = 0 must mean that p(x;) = 0 fori = 1,2,...,n +1. So p(x) 
has n + 1 different roots. But p(x) has degree no more than n, so p must 
be the identically-zero function. (A non-zero polynomial of degree at 
most n has no more than n distinct roots.) Part (ii) is left to you. 


Activity 10.4 Prove that, for any a, 6 € Rand anyp,q,re V, 


(ap + Bq,r) =a(p,r) + B(q,r). 


Example 10.5 Let’s define, for x, y € R’, 


(xX, y) = X1)1 + 2x2y2. 


Then this is an inner product. It is very easy to see that (x, y) = (y, x). 
It is straightforward to check that (ax + By, z) = a(x, z) + Bly, Z). 
Finally, we have (x, x) a + 2x4 so (x, x) > 0 and, furthermore, 
(x, x) = Oif and only if x; = x. = 0, meaning x = 0. 


Example 10.3 shows how we may define an inner product on a vector 
space other than R”, and Example 10.5 shows how we may define an 
inner product on R? which is different from the standard Euclidean 
inner product (whose value would simply be x;y; + x22). 
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10.1.3 Norms in a vector space 


For any x in an inner product space V, the inner product (x, x) is 
non-negative (by definition). Now, because (x, x) > 0, we may take its 
square root (obtaining a real number). We define the norm or length 
||x|| of a vector x to be this real number: 


Definition 10.6 (Norm) Suppose that V is an inner product space and 
x is a vector in V. Then the norm, or length, of x, denoted ||x||, is 


|x| = v (x, x). 


For example, for the standard inner product on R”, 


wx eee 7 ee oe oe 


(which is clearly non-negative since it is a sum of squares), the norm is 
the standard Euclidean length of a vector: 


Ixl = yx? +x? + 422. 


We say that a vector v is a unit vector if it has norm 1. If v Æ 0, then 
it is a simple matter to create a unit vector in the same direction as v. 
This is the vector 

1 


Iivi 


The process of constructing u from v is known as normalising v. 


10.1.4 The Cauchy—Schwarz inequality 
This important inequality will enable us to apply the geometric intuition 
we have developed to a much more general, completely abstract, setting. 


Theorem 10.7 (Cauchy—Schwarz inequality) Suppose that V is an 
inner product space. Then 


I(x, y) < Ixllllyll 
forallx,y € V. 


Proof: Let x, y be any two vectors of V. For any real number œ, we 
consider the vector wx + y. Certainly, ||ax + y||? > 0 for all æ. But 
lax + yll? = (ox +y, ax + y) 
= a*(x, x) + a(x, y) +aly, x) + (Y, y) 
= a |x|? + 2a (x, y) + Iyl’. 
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Now, this quadratic expression in « is non-negative for all w. Generally, 
we know that if a quadratic expression at? + bt + c is non-negative for 
all t, then b? — 4ac < 0. 


Activity 10.8 Why is this true? 


Applying this observation to the above quadratic expression in œ, we 
see that we must have 


(2(x, y)? — 4IIxil’llyll? < 0, 
or 
(x, yp? < Ixl yN? 
Taking the square root of each side, we obtain 


I(x, y)] < IIxllllyll. 


which is what we need. 


For example, if we take V to be IR” and consider the standard inner 
product on R”, then for all x, y € R”, the Cauchy—Schwarz inequality 


tells us that 
<, Dae a 
i=l i=l 


10.2 Orthogonality 


n 
So xii 
i=l 


10.2.1 Orthogonal vectors 


We are now ready to extend the concept of angle to an abstract inner 
product space V. To do this, we begin with the result that in R?, (x, y) = 
\|x|| |ly|| cos @, where 8 is the angle between the vectors. This suggests 
that we might, more generally (in an abstract inner product space), 
define the cosine of the angle between vectors x and y to be 


_ _%y) 
IIx lly 
This definition will only make sense if we can show that this number 


cos @ is between —1 and 1. But this follows immediately from the 
Cauchy—Schwartz inequality, which can be stated as 


(x, y) = 
xl] yl 


The usefulness of this definition is in the concept of orthogonality. 
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Definition 10.9 (Orthogonal vectors) Suppose that V is an inner 
product space. Then x,y € V are said to be orthogonal if and only 
if (x, y) = 0. We write x L y to mean that x, y are orthogonal. 


So what we have here is a definition of what it means, in a general 
inner product space, for two vectors to be orthogonal. As a special case 
of this, of course, we have the familiar notion of orthogonality in R” 
(when we use the standard inner product), but the key thing to stress is 
that this definition gives us a way to extend the notion of orthogonality 
to inner product spaces other than R”. 


Example 10.10 With the usual inner product on R4, the vectors x = 
(1, —1,2,0)' and y = (—1, 1, 1, 4)! are orthogonal. 


Activity 10.11 Check this! 


10.2.2 A generalised Pythagoras theorem 


We can now begin to imitate the geometry of vectors discussed in 
Chapter 1. We are already familiar with Pythagoras’ theorem in R?, 
which states that if c is the length of the longest side of a right-angled 
triangle, and a and b the lengths of the other two sides, then c? = 
a? + b?. The generalised Pythagoras theorem is: 


Theorem 10.12 (Generalised Pythagoras theorem) /n an inner prod- 
uct space V, ifx, y € V are orthogonal, then 

IIx + yl? = Ixl? + lly. 
Proof: This is fairly straightforward to prove. We know that for any z, 
Iiz]? = (z, z}, simply from the definition of the norm. So, 

Ix + yl? = x+y, x+y) 

= (x, x + y) + (y, x+y) 

= (x, x) + (x, y) + (y, x) + (Y, y) 

= |x]? + 2(x, y) + Ilyl? 

= |IxI? + lly’, 
where the last line follows from the fact that, x, y being orthogonal, 
(x,y) = 0. 


We also have the triangle inequality for norms. In the special case of 
the standard inner product on R?, this states the obvious fact that the 
length of one side of a triangle must be no more than the sum of the 
lengths of the other two sides. 
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Theorem 10.13 (Triangle inequality for norms) /n an inner product 
space V, ifx, y € V, then 


Ix + yll < IIxll + llyll. 
Proof: We have 


Ix+yl? = (x+y, x+y 
= (x, x + y) + (y, x + y) 
= (x, x) + (x, y) + (y, x) + (y, y) 
= ||x||? + 2(x, y) + llyll? 
< Ixl? + Iyl? +21 (x, y) | 
< IIx? + Iyl? + 2lixlilly1l 
= ((Ix|] + llyll)’. 


where the last inequality used is the Cauchy—Schwarz inequality. Thus, 
IIx + yl] < |Ixll + llyll, as required. 


10.2.3 Orthogonality and linear independence 


If a set of (non-zero) vectors are pairwise orthogonal (that is, if any 
two are orthogonal), then it turns out that the vectors are linearly 
independent: 


Theorem 10.14 Suppose that V is an inner product space and that 
vectors V1, V2, ..., Vg E V are pairwise orthogonal (meaning vi L vj; 
fori + j), andnone is the zero-vector. Then {Vv1, V2, . . . , Vx} is a linearly 
independent set of vectors. 


Proof. We need to show that if 


1V, + 2V2 +--+: + anv, = 0, 


(the zero-vector), then aj = a2 =--- =a, = 0. Let i be any integer 
between | and k. Then taking the inner product with v;, 


(Vj, @1V] + 2V2 +--+ + Vk) = (v;, 0) = 0. 
But 
(Vi, 1V1 +--+ + aKV~) 


= œi (Vi, Vi) H00 + Qi (Vi, Vi-1) + Qi (Vi, Vi) 
+ Qipi (Vi, Viti) + +++ + OR (Vj, Ve). 


Since (v;, vj} = 0 for j 47, this equals a; (v;, v;), which is at; ||v;||7. So 
we have @;||v; ||? = 0. Since v; Æ 0, ||v; ||? 4 0 and hence a; = 0. Buti 
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was any integer in the range | to k, so we deduce that 


æi = Q2 = --- = Q = Q, 


as required. 


10.3 Orthogonal matrices 
10.3.1 Definition of orthogonal matrix 


There is a particulary useful property that a matrix might possess, and 
which has links with orthogonality of vectors. This is described in the 
following definition. 


Definition 10.15 (Orthogonal matrix) An n x n matrix P is said to 
be orthogonal if PTP = PPT = J; that is, if P has inverse PT. 


Example 10.16 The matrix 
p= ( 3/5 A 
~ \-4/5 3/5 
is orthogonal. 


Activity 10.17 Check this! 


At first it appears that the definition of an orthogonal matrix has little 
to do with the concept of orthogonality of vectors. But, as we shall see, 
it is closely related. If P is an orthogonal matrix, then PTP = J, the 
identity matrix. Suppose that the columns of P are x1, X2,...,X,. Then 
the fact that PTP = J means that x}x; = 0 if i # j and x}x; = 1, as 
the following theorem shows: 


Theorem 10.18 A matrix P is orthogonal if and only if, as vectors, its 
columns are pairwise orthogonal, and each has length 1. 


Proof, Let P = (x; X2 <+- X,), so that PT is the matrix whose rows 
are the vectors x}, x},..., xT. Then PTP = J can be expressed as 
xh xix xix. «++ xix, 
T T T T 
X3 X3 X] X3 X2 DRS X3 Xn 
© [OX +++ Xn) = . : 
T T T T 
x] xix, xix. --- xix, 
1 0 0 
0 1 0 
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The theorem is an ‘if and only if’ statement, so we must prove it both 
ways. 
If PTP = I, then the two matrices on the right are equal, and so 


x} X; = (x;,x;) = |x,’ = 1 and x} x; = (5,5) = 0 ifi £ j. 


This says that the vectors are unit vectors and that the columns x;, x; 
are orthogonal. 


Conversely, if the columns are pairwise orthogonal unit vectors, 
then 


IIx; |]? = (x, x;) = x} x; =] and (x;,x;) = xx; = 0 fori Æ j, 


so the matrix PTP is equal to the identity matrix. 


10.3.2 Orthonormal sets 


Theorem 10.18 characterises orthogonal matrices through an important 
property of their columns. This important property is given a special 
name. 


Definition 10.19 (Orthonormal) A set of vectors {x), X2, ..., X4} in 
an inner product space V is said to be an orthonormal set if any two 
different vectors in the set are orthogonal and each vector is a unit 
vector; that is, 


(x;,x;)=0 for iff and \|x; || = 1. 


An important consequence of Theorem 10.14 is that an orthonor- 
mal set of n vectors in an n-dimensional vector space is a basis. If 
{V1, V2, ..., Vn} is an orthonormal basis of a vector space V, then the 
coordinates of any vector w € V are easy to calculate, as shown in the 
following theorem. 


Theorem 10.20 Let B = {v1, V2,..., Vn} be an orthonormal basis of 
a vector space V and let w € V. Then the coordinates ai, a2, ..., An Of 
w in the basis B are given by 
aj = (W, Vi). 
Proof. We have w = av; + aV2 +---+4,V,. We calculate the inner 
product of w with a basis vector v;. 
(W, Vi) = (a1V1 + Q2V2 + +++ + GnVn, Vi) 
= ay (vı, vi) + az(V2, Vi) TaT an (Vn, Vi) 
= dj (Vi, Vi) 


= di. 
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The last two equalities follow from the fact that {v1, v2,...,V,} is an 
orthonormal set. 


If P is an orthogonal matrix, then its columns form an orthonormal 
basis. So we can restate Theorem 10.18 as follows. 


Theorem 10.21 An n x n matrix P is orthogonal if and only if the 
columns of P form an orthonormal basis of R". 


If the matrix P is orthogonal, then since P = (P')', the matrix PT is 
orthogonal too. 


Activity 10.22 Show that if P is orthogonal, so too is PT. 


It therefore follows that Theorem 10.21 remains true if column is 
replaced by row, with rows written as vectors, and we can make the 
following stronger statement: a matrix P is orthogonal if and only if the 
columns (or rows, written as vectors) of P form an orthonormal basis 
of R”. 


10.4 Gram-Schmidt orthonormalisation process 


Given a set of linearly independent vectors {v1, V2, ..., Vg}, the Gram— 
Schmidt orthonormalisation process is a way of producing k vectors that 
span the same space as {Vv}, V2,..., Vg}, and that form an orthonormal 
set. That is, the process produces a set {u;, U2, . . . , ux} such that: 

e Linf{uy,u,..., uz} = Linfvy, vo,..., ve} 

e {Uy, W4, ..., uz} is an orthonormal set. 


We will see in the next chapter why this is a useful process to be able 
to perform. It works as follows. First, we set 


vi 


u = — 
IIvill 


so that u; is a unit vector and Linfu,} = Lin{vj}. 


Then we define 
W2 = V2 — (V2, U1 )U], 
and set 
W2 
wb = ——. 
I| Wo || 


Then {u;, u2} is an orthonormal set and Lin{u,, u2} = Lin{vy,, vo}. 
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Activity 10.23 Make sure you understand why this works. Show that 
w2 L u; and conclude that u2 L u;. Why are the linear spans of {u,, u2} 
and {v1, V2} the same? 


Next, we define 
W3 = V3 — (V3, Uy )Uy — (V3, U2) Up 


and set 
W3 


w= . 
I| ws || 
Then {u;, u2, u3} is an orthonormal set and Lin{u,, u2, u3} is the same 
as Lin{v), V2, v3}. Generally, when we have uj), u2, ..., u;, we let 
i 
Witi = Vipi — >_(Vi41, Us) Uy, 
j=l 


Wi+1 
Ui41 = . 
Iwll 
Then the resulting set {u,, Uo, ..., uz} has the required properties. 


Example 10.24 In R4, let us find an orthonormal basis for the linear 
span of the three vectors 


1 -1 3 
v = 14? w= 4 7 = 2 
1 -1 1 
First, we have 
1/2 
EE S M ERE HA 
e~a eee eae 2 ee 
1/2 
Next, we have 
=j 1/2 —5/2 
7 fal 1/2 | _| 5⁄2 
We Se NV A) ara S a 
—] 1/2 —5/2 
and we set 
=1/2 
aa man [i 
° Iwall ne 
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(Note: to do this last step, we merely noted that a normalised vector 
in the same direction as w2 is also a normalised vector in the same 
direction as (—1, 1, 1, —1)', and this second vector is easier to work 
with.) At this stage, you should check that u? L u. Continuing, we 
have 


W3 = V3 — (V3, Wi) Uy — (V3, U2) U2 


4 1/2 —1/2 2 
_ j2 1/2 1/2 |_| -2 
SE race 1/2 ~ (2) 1/2 ie 
0 1/2 —1/2 = 
Then, 
W3 T 
u = = (1/2, —1/2, 1/2, —1/2)". 
|| w3 | 
So 
1/2 —1/2 1/2 
— Jf 12 1/2 —1/2 
tuu} = 4 ote} a [ol i 
1/2 —1/2 —1/2 


Activity 10.25 Work through all the calculations in this example. Then 
verify that the set {u;, u2, u3} is an orthonormal set. 


10.5 Learning outcomes 


You should now be able to: 


e explain what is meant by an inner product on a vector space 

e verify that a given inner product is indeed an inner product 

e compute norms in inner product spaces 

e state and apply the Cauchy—Schwarz inequality, the generalised 
Pythagoras theorem, and the triangle inequality for norms 

e prove that orthogonality of a set of vectors implies linear 
independence 

e explain what is meant by an orthonormal set of vectors 

e use the Gram-Schmidt orthonormalisation process 

e state what is meant by an orthogonal matrix 

e explain why ann x n matrix is orthogonal if and only if its columns 
are an orthonormal basis of R” 
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10.6 Comments on activities 


Activity 10.2 By property (i) (z, «x + By) = (ax+ By, z). Then 
applying property (ii), and then property (i) again, the result follows. 


Activity 10.4 Since ap + fq is the polynomial function 


x > ap(x) + q(x), 


we have 
n+1 
(ap + Bq, r) = X (ap) + Ba(xi) ri) 
i=l 
n+l n+l 


=a 5 plaxi (xi) + B 5 q(xi)r (xi) 
i=] i=] 


= a(p,r) + (q, r), 


as required. 


Activity 10.8 By the quadratic formula, the solutions of 
at? +bt+c=0, 


are given by 


J= —b + Jb? — 4ac 
= Ba . 


If b? — 4ac > 0, then this will have two real solutions, so the graph of 
the function f(t) = at? + bt + c will cross the ¢ axis twice, and so it 
must have both positive and negative values. Therefore it would not be 
true that at? + bt +c > 0 forall t € R. 


Activity 10.11 Just check that (x, y) = 0. 
Activity 10.17 Multiply PTP and show that you get the identity matrix. 


Activity 10.22 The matrix P is orthogonal if and only if PPT = 
PTP =I. Since (PDT =P, this statement can be written as 
(PIT PT = P'(P')! = I, which says that PT is orthogonal. 
Activity 10.23 We have 

(w2, U1) = (V2 — (V2, U1) U1, U1) = (V2, U1) — (V2, U1) (U1, U1) = O. 


The fact that w2 L u; if and only if u L u; follows from property (ii) 
of the definition of inner product since w) = œu, for some constant a. 
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The linear spans are the same because uy, uz are linear combinations 
of vı, V2 and conversely. 


Activity 10.25 We only need to check that each u; satisfies ||u;|| = 1, 
and that (u1, u2) = (u1, U3) = (u2, uz) = 0. All of this is very easily 
checked. (It is much harder to find the u; in the first place. But once you 
think you have found them, it is always fairly easy to check whether 
they form an orthonormal set, as they should.) 


10.7 Exercises 


Exercise 10.1 Let V be the vector space of all m x n real matrices 
(with matrix addition and scalar multiplication). Define, for A = (a;;) 
and B = (b;;) EV, 


(A,B) =X X ajjbi;. 
i=l j=l 
Prove that this is an inner product on V. 
Exercise 10.2 Prove that in any inner product space V, 
IIx + yl? + lx — yl? = 21x1? + 2ilyI?, 
for allx,y € V. 


Exercise 10.3 Suppose that v € R”. Prove that W = {x €e R” | x Lv}, 
the set of vectors orthogonal to v, is a subspace of R”. How would you 
describe this subspace geometrically? 

More generally, suppose that S is any (not necessarily finite) set of 
vectors in R” and let S+ denote the set 


S+ = {x e R” |x Lv forall ve S}. 


Prove that S+ is a subspace of R”. 


Exercise 10.4 Show that if P is an orthogonal matrix, then |P| = +1. 


Exercise 10.5 Consider the mapping from pairs of vectors x, y € R? 
to the real numbers given by 


(x, y) =x! Ay, with A= G D ; 


where the 1 x 1 matrix x! Ay is interpreted as the real number which is 
its only entry. Show that this is an inner product on R?. 


o e) G) 


326 Inner products and orthogonality 


(a) Find (v, w) under this inner product. 

(b) Find the length of the vector v in the norm defined by this inner 
product. 

(c) Find the set of all vectors which are orthogonal to the vector v 
under this inner product. That is, if S = Lin(v), find 


St ={xeR* |x1Lv}. 


Write down a basis of S+. 

(d) Express the vector w above as w = w; + W2 where w; € S and 
wW € St. 

(e) Write down an orthonormal basis of R? with respect to this inner 
product. 


Exercise 10.6 Use the Gram-Schmidt process to find an orthonormal 
basis for the subspace of R4 spanned by the vectors 


1 


1 0 
_ {0 2 1 
Ys gy Pe 1 2 
0 1 1 
Exercise 10.7 Let 


f- 


Find an orthonormal basis of W. Extend it to an orthonormal basis of 
R3, 


r-3+3z=0}, 


10.8 Problems 


Problem 10.1 Consider the vectors 


, b= 
1 0 


Show that the vectors a, b, b — a form an isosceles right-angled trian- 
gle. Verify that these vectors satisfy the generalised Pythagoras theorem. 


1 
es 1 
ee 


me W N 


Problem 10.2 Let Æ be an m x k matrix with full column rank, mean- 
ing that rank(A) = k. 


(a) Show that ATA is a k x k symmetric matrix. Show also that 
x'(AT A)x > 0 for all x Æ 0, x € R*. 
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(b) Using the results in part (a), show that the mapping from pairs of 
vectors in R* to the real numbers given by the rule 


(x,y) =x'(ATA)y 


defines an inner product on R*, where the 1 x 1 matrix x'(A™A)y 
is identified with the scalar which is its unique entry. 


Problem 10.3 If P is an orthogonal n x n matrix and x = Pz, show 
that ||x|| = ||z|| using the standard inner product on R”. 


Problem 10.4 Let P be an orthogonal n x n matrix and let T be the 
linear transformation defined by 7(x) = Px. Using the standard inner 
product, show that for any x, y € R”, 


(T(x), T(y)) = (x, y). 


Problem 10.5 Suppose T and S are linear transformations of R? to R? 
with respective matrices: 


3 T 1 0 
A= (9 2) As=(% ae 
v2 V2 

Describe the effect of T and S in words. Show that the both Ar and As 
are orthogonal matrices. 

Write down the matrix A that represents the linear transformation 
of R? which is a rotation anticlockwise about the origin by an angle 0. 
Show that 4 is an orthogonal matrix. 


Problem 10.6 Find an orthonormal basis for the subspace of R? given 


by 
x 
r={()] | sr-y+2z=0}. 
Z 


Extend this to an orthonormal basis of R3. 


Problem 10.7 Show that S = {v1, v2, v3} is a basis of R*, where 


0 C 0 


Beginning with the vector v,, find an orthonormal basis of the subspace 
Lin{v;, v2}. Using any method, extend this to an orthonormal basis B 
of R?. 

Find the B coordinates of the vectors vz and v3. 

Find the transition matrix P from coordinates in the basis S to 
coordinates in the basis B. 

Check that [v3]z = P[v3]s. 
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Problem 10.8 Beginning with the vector vı, use the Gram-Schmidt 
orthonormalisation process to obtain an orthonormal basis for the sub- 
space of Rt spanned by the following vectors: 


1 3 2 
0 0 1 
vV = 1 w= 2 V3 = 1 
0 0 3 


Problem 10.9 Put the following matrix into reduced row echelon form: 


1 1 -1 2 
a=(-1 0 1 i): 
E 2 =I -5 


Find an orthonormal basis of the null space of A. Extend this to an 
orthonormal basis of Rf using the row space of A. 


Problem 10.10 Consider the planes in R?: 


f(r C 


Find the vector equation of the line of intersection of U and V. 
Find vectors x, y, z in R? with the following properties: 


ir t2y+z=0. 


(i) The vector x is on both planes, that is, x € UNV; 
(ii) The set {x, y} is an orthonormal basis of U; 
(iii) The set {x, z} is an orthonormal basis of V. 


Is your set {x, y, z} a basis of R*? Is it an orthonormal basis of R*? 
Justify your answers. 


11 


Orthogonal diagonalisation 
and its applications 


In this chapter, we look at orthogonal diagonalisation, a special form 
of diagonalisation for real symmetric matrices. This has some useful 
applications: to quadratic forms, in particular. 


11.1 Orthogonal diagonalisation of symmetric 
matrices 


Recall that a square matrix A = (a;;) is symmetric if AT = A. Equiv- 
alently, A is symmetric if aj; = aj; for all i, j; that is, if the entries in 
opposite positions relative to the main diagonal are equal. It turns out 
that symmetric matrices are always diagonalisable. They are, further- 
more, diagonalisable in a special way. 


11.1.1 Orthogonal diagonalisation 


We know what it means to diagonalise a square matrix A. It means to find 
an invertible matrix P and a diagonal matrix D such that P~-' AP = D. 
If, in addition, we can find an orthogonal matrix P which diagonalises A, 
so that P~-'AP = PTAP = D, then this is orthogonal diagonalisation. 


Definition 11.1 A matrix A is said to be orthogonally diagonalisable 
if there is an orthogonal matrix P such that P'4P = D where D isa 
diagonal matrix. 


As P is orthogonal, PT = P~!, so PTAP = P7!AP = D. The fact 
that A is diagonalisable means that the columns of P are a basis of 
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R” consisting of eigenvectors of A (Theorem 8.22). The fact that A 
is orthogonally diagonalisable means that the columns of P are an 
orthonormal basis of R” consisting of an orthonormal set of eigenvec- 
tors of A (Theorem 10.21). Putting these facts together, we have the 
following theorem: 


Theorem 11.2 A matrix A is orthogonally diagonalisable if and only 
if there is an orthonormal basis of IR" consisting of eigenvectors of A. 


Let’s look at some examples. 


Example 11.3 The matrix 


7 —15 
B=(3 <4). 
which we have met in previous examples, cannot be orthogonally diag- 
onalised. The eigenvalues are 4; = 1 and Az = 2. All the eigenvec- 
tors corresponding to A = 1 are scalar multiples of vı = (5, 2)!, and 
all the eigenvectors corresponding to à = 2 are scalar multiples of 
v2 = (3, 1)'. Since 


m= (CO) 


no eigenvector in the eigenspace of à; is perpendicular to any eigenvec- 
tor for Az, so it is not possible to find an orthogonal set of eigenvectors 
for B. 


Example 11.4 Now consider the matrix 
5 -3 
i, 
The eigenvalues are given by 
o {5-7 3\_ o = 
Įa-an= (3 5) =% -10+ 16 =0. 


So the eigenvalues are A, = 2 and Az = 8. The corresponding eigen- 
vectors are the solutions of (4 — A/)v = 0, so 


3 -3 1 -l 1 
a-an=(3 7) >lo o) = ™=() 
—3 -3 1 1 —1 
e Eee E 
Because (w1, W2) = 0, the eigenvectors w; and w2 are orthogonal! So 4 
can be orthogonally diagonalised. We just need to normalise the vectors 
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by making them into unit vectors. If 


el ae ol 

p=(# P) and B J 
Iau Y2 

then P is orthogonal and PTAP = P7!AP = D. 


Note that the matrix A in this example is symmetric, whereas the matrix 
B in the first example is not. 


11.1.2 When is orthogonal diagonalisation possible? 


It’s natural to ask which matrices can be orthogonally diagonalised. 
The answer is remarkably straightforward and is given by the following 
important result. 


Theorem 11.5 (Spectral theorem for symmetric matrices) The 
matrix A is orthogonally diagonalisable if and only if A is symmetric. 


Since this is an if and only if statement, it needs to be proved in both 
directions. One way is easy: if A can be orthogonally diagonalised, 
then it must be symmetric. 


Activity 11.6 Try to prove this yourself before you continue reading. 
Assuming that A can be orthogonally diagonalised, write down what 
this means, and then show that AT = A. 


The argument goes as follows. If A is orthogonally diagonalisable, then 
there exists an orthogonal matrix P and a diagonal matrix D such that 
PTAP = P~'AP = D. Then solving for the matrix A, 


A= PDP! = PDP]. 


Taking the transposes of both sides of this equation (using properties 
of transpose), and using the fact that DT = D since D is diagonal, we 
have 


A! =(PDP')' = PD'P! = PDP! = A, 


which shows that 4 is symmetric. 

So only symmetric matrices can be orthogonally diagonalised. 
That’s the ‘only if’ part of the proof. It is much more difficult to prove 
the ‘if’ part: if a matrix is symmetric, then it can be orthogonally diago- 
nalised. We will first prove this for the special case in which the matrix 
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has distinct eigenvalues, and then prove it for the more general case in 
Section 11.1.5. 

For both of these, we will need one important fact about symmetric 
matrices: symmetric matrices have only real eigenvalues. (As we noted 
in Theorem 8.42, this is a necessary condition for diagonalisability, so 
we certainly need it.) We state it here as a theorem, but we will defer 
the proof of this fact until Chapter 13, as it is most easily established 
as a corollary (that is, a consequence) of a similar theorem on complex 
matrices. 


Theorem 11.7 If A is a symmetric matrix, then all of its eigenvalues 
are real numbers. 


This means that the characteristic polynomial of an n x n symmetric 
matrix factorises into n linear factors over the real numbers (repeating 
any roots with multiplicity greater than 1). 


11.1.3 The case of distinct eigenvalues 


Assuming Theorem | 1.7, we now prove Theorem 11.5 for symmetric 
n x n matrices which have n different eigenvalues. To do so, we need 
the following result: 


Theorem 11.8 /f the matrix A is symmetric, then eigenvectors corre- 
sponding to distinct eigenvalues are orthogonal. 


Proof. Suppose that A and u are any two different eigenvalues of A and 

that x, y are corresponding eigenvectors. Then Ax = Ax and Ay = py. 

The trick in this proof is to find two different expressions for the product 

x! Ay (which then, of course, must be equal to each other). Note that 

the matrix product x! Ay is a1 x 1 matrix or, equivalently, a number. 
First, since Ay = uy, we have 


x! Ay = x'(Ay) = x" (uy) = ux'y. 


But also, Ax = Ax. Since A is symmetric, A = A‘. Substituting and 
using the properties of the transpose of a matrix, we have 


x Ay =x! Aly =(x' A )y = (Ax)'y = (Ax)'y = Ax y. 


Equating these two different expressions for x’ Ay, we have uxTy = 
Axly, or 


(u—A)x'y = 0. 
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But since à + u (they are different eigenvalues), we have u — A Æ 0. 
We deduce therefore that x'y = (x, y) = 0. But this says precisely that 
x and y are orthogonal. 


Theorem 11.8 shows that if an n x n symmetric matrix has exactly n 
different eigenvalues and if we take a set of n eigenvectors with one 
eigenvector corresponding to each eigenvalue, then any two of these 
eigenvectors are orthogonal to one another. We may take the eigenvec- 
tors to have length 1, simply by normalising them. This shows that we 
have an orthonormal set of n eigenvectors, which is therefore a basis of 
R”. So by Theorem 11.2, the matrix can be orthogonally diagonalised. 
But let’s spell it out. If P is the matrix with this set of eigenvectors 
as its columns, then (as usual) P~!4P = D, the diagonal matrix of 
eigenvalues. Moreover, since the columns of P form an orthonormal 
set, by Theorem 10.18, P is an orthogonal matrix. So P~! = PT and 
hence P' AP = D. In other words, we have the following result (which 
outlines the method): 


Theorem 11.9 Suppose that A is symmetric and has n different eigen- 
values. Take n corresponding unit eigenvectors, each of length 1. Form 
the matrix P which has these unit eigenvectors as its columns. Then 
P~! = P” (that is, P is an orthogonal matrix) and P' AP = D, the 
diagonal matrix whose entries are the eigenvalues of A. 


Here is an example of the technique. 


Example 11.10 The matrix 


404 
a=(0 a 4) 
4 4 8 


is symmetric. We have seen in Example 8.23 that it has three distinct 
eigenvalues, A; = 4, Az = 0, A3 = 12, and we found that correspond- 
ing eigenvectors are (in that order) 


C) (9) 0) 


Activity 11.11 Check that any two of these three eigenvectors are 
orthogonal. 


These eigenvectors are mutually orthogonal, but not of length 1, so we 
normalise them. For example, the first one has length /2. If we divide 
each entry of it by V2, we obtain a unit eigenvector. We can similarly 
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normalise the other two vectors, obtaining 


1/2 1/3 1/V6 
u= (-tiv8). w= ea w= (v8). 
0 —1/¥3 2/6 


Activity 11.12 Verify that the normalisations of the second and third 
vectors are as just stated. 


We now form the matrix P whose columns are these unit eigenvectors: 


1/ V2. 1/73 1/76 
P= (= 1/V3 v8 
0 —1/V¥3 2/76 


Then P is orthogonal and P'AP = D = diag(4, 0, 12). 
Activity 11.13 Check that P is orthogonal by calculating PTP. 


11.1.4 When eigenvalues are not distinct 


We have seen that if a symmetric matrix has distinct eigenvalues, then 
(since eigenvectors corresponding to different eigenvalues are orthog- 
onal) it is orthogonally diagonalisable. But, as stated in Theorem 11.5, 
all n x n symmetric matrices are orthogonally diagonalisable, even if 
they do not have n distinct eigenvalues. We will prove this in the next 
section, but first we discuss how, in practice, we would go about orthog- 
onally diagonalising a matrix in the case when it does not have distinct 
eigenvalues. 

What we need for orthogonal diagonalisation is an orthonormal 
set of n eigenvectors. As we have seen, if it so happens that there are 
n different eigenvalues, then any set of n corresponding eigenvectors 
form a pairwise orthogonal set of vectors, and all we need to do is 
normalise each vector. However, if we have repeated eigenvalues, more 
care is required. 

Suppose that Ao is a repeated eigenvalue of A, by which we mean 
that, for some k > 2, (A — ào) is a factor of the characteristic polyno- 
mial of A. As we saw in Definition 8.39, the algebraic multiplicity of ào 
is the largest k for which this is the case. The eigenspace corresponding 
to Ag is (see Definition 8.9) 


E(Ao) = {x | (A — Aol)x = 0}, 


the subspace consisting of all eigenvectors corresponding to Ao, together 
with the zero-vector 0. It turns out (and, indeed, by Theorem 8.42, 
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it must be the case if A is diagonalisable) that, for any symmetric 
matrix A, the dimension of E(AQ) (that is, the geometric multiplicity) 
is exactly the algebraic multiplicity k of ào. This means that there is 
some basis {Vv1, V2, ..., Vx} of k vectors of the eigenspace E (ào). So far, 
we are proceeding just as we would in diagonalisation, generally. But 
remember that we are trying to orthogonally diagonalise. We therefore 
use the Gram-Schmidt orthonormalisation process to take any such 
basis and produce an orthonormal basis of E (ào). 

Since, by Theorem 11.8, eigenvectors from different eigenspaces 
are orthogonal (and hence linearly independent), if we construct a set of 
n vectors by taking orthonormal bases for each of the eigenspaces, the 
resulting set is an orthonormal basis of R”. We can therefore orthog- 
onally diagonalise the matrix A by means of the matrix P with these 
vectors as its columns. Here is an example of how we can carry out this 
process. 


Example 11.14 We orthogonally diagonalise the symmetric matrix 


2.1 1 
B= | 1 2 1 ) ; 
1 1 2 
The eigenvalues of B are given by the characteristic equation 


IB — àI] = 23+ 67 -9 +4 =- -1A -4=0. 


The eigenvalues are 4 and 1, where 1 is an eigenvalue of multiplicity 2. 
We will find the eigenvectors for A = 1 first. Reducing B — J, we 


have 
1 1 1 1 1 1 
[ 1 i) => (0 0 o). 
1 1 1 0 0 0 


so the eigenspace for à = 1 does indeed have dimension 2. From the 
reduced row echelon form, we deduce the linearly independent eigen- 


vectors 
—|l —| 
Y= 1 5 V2 = | 0 ) 3 
0 1 
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-() 


The vectors vı and vz are not orthogonal. However, each of the vectors 
vı and vz is orthogonal to v3, the eigenvector for à = 4. (This must be 
the case since they correspond to distinct eigenvalues.) 


so we may take 


Activity 11.15 Check that (v1, v3) = 0, (v2, v3) = 0, and (v1, v2) Æ 0. 


Notice that the eigenspace for à = 1 can be described geometrically 
as a plane through the origin in R? with normal vector v3. It consists 
of all linear combinations of vı and v2; that is, all vectors which are 
perpendicular to v3. 


Activity 11.16 Look at the reduced row echelon form of the matrix 
B — I. Could you have deduced the last eigenvector from this matrix? 
Why? 


We still need to obtain an orthonormal basis of eigenvectors, so we now 
apply the Gram-Schmidt orthonormalisation process to Lin{v;, v2}. 
First we set 


=1/./2 
u= | 1/72 
0 
Then we define 
-1 -1 —1/v2 =1fs/2 —1/2 
w.=| 0 -( 0 |,| 1/72 ) 1//2 |= | -1/2 
1 1 0 0 1 


This vector is parallel to (—1, —1 , 2) with length v6, so we have 
—1//6 
w = | —1/V6 
2//6 


Activity 11.17 What should you check now, before you proceed to the 
next step? 


Normalising the vector v3, we can let P be the matrix 


=1//2 =1//6 1/V3 
| 1/ V2 -1/¥6 Wa 
0 2/V6 1/43 
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and D the diagonal matrix 


1 0 0 
p= (o 1 o). 
0 0 4 


Then PT = P~! and P'BP = D. 


11.1.5 The general case 


Assuming for now the fact that symmetric matrices have only real eigen- 
values (Theorem 11.7, which will be proved in Chapter 13), we have 
proved Theorem | 1.5 for symmetric matrices with distinct eigenvalues. 
We have also indicated how, in practice, to orthogonally diagonalise 
any symmetric matrix (in general, even if the eigenvalues are not dis- 
tinct). To complete the picture, we will now prove the Spectral theorem 
for symmetric matrices in general. This is a fairly long and difficult 
proof, and it can safely be omitted without affecting your ability to 
carry out orthogonal diagonalisation. You can skip it and proceed on to 
the next section, where we begin to look at applications of orthogonal 
diagonalisation. However, we include the proof for two reasons: first, 
for completeness and, second, because it draws on many of the most 
important ideas we have studied so far, so trying to understand it will 
be a good exercise. 

We will give a proof by induction on n, the size of the matrix. That 
means we establish the theorem for the case n = 1 and we show that, 
for n > 2, if the theorem holds for all symmetric (n — 1) x (n — 1) 
matrices, then it will also be true for n x n matrices. (So, the n = 2 
case then follows from the n = 1 case; the n = 3 from the n = 2, and 
so on.) 


Proof of Theorem 11.5 Let’s just remind ourselves what it is we are 
trying to prove. It is that, for any symmetric matrix A, there is an 
orthogonal matrix P and a diagonal matrix D such that P'AP = D. 
Any | x 1 symmetric matrix is already diagonal, so we can take 
P = I, which is an orthogonal matrix. So the result is true when n = 1. 
Now let us consider a general value of n > 2 and assume that the 
theorem holds for all (n — 1) x (n — 1) symmetric matrices. Let A be 
any n x n symmetric matrix. As mentioned above, we take for granted 
now (and will prove in Chapter 13) the fact that A has real eigenvalues. 
Let à; be any eigenvalue of A and let vı be a corresponding eigenvector 
which satisfies ||v;|| = 1. By Theorem 6.45, we can extend the basis 
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{vı} of Lin{v,} to a basis {v,, X2, x3,...,X,} of R”. We can then use 
the Gram-Schmidt process to transform this into an orthonormal basis 
B = {vj, Vo,..., Vn} of R”. (Remember that we chose v; to be a unit 
vector, so we can take it to be the first member of the orthonormal 
basis.) 

Let P be the matrix whose columns are the vectors in B, with the 
first column being vı. Then P is orthogonal, by Theorem 10.18, and (by 
Theorem 7.37) P'AP = P~!AP represents the linear transformation 
T :xt> Axin the basis B. But we know, by Theorem 7.36, that the first 
column of PTAP will be the coordinate vector of T(v,) with respect 
to the basis B. Now, T(v;) = Av; = Av, so this coordinate vector 
is 


At 
0 
0 
It follows that, for some numbers d1, d2,...,dn—1 and cq, j fori, j = 
1,...,2 —1, PTAP takes the form 
At dı vee a 
0 C(1,1) aes C(1,n—1) 
PlaP=|9 cay + Cna- 
0 Cm-1,1) +t C@—1n-1) 


But A is symmetric, and so therefore is P' AP, since 
(PTAP) = PA'P = P'AP. 
The fact that this matrix is symmetric has two immediate consequences: 


e d =d)=---=d,_-; = 0; 
e the (n — 1) x (n — 1) matrix C = (cq, ;)) is symmetric. 


So we can write 


Typ [^ a 
PuAP = (‘9 c)’ 


where 0 is the all-zero vector of length n — 1 and C is a symmetric 
(n — 1) x (n — 1) matrix. 

We are assuming that the theorem holds for (n — 1) x (n — 1) sym- 
metric matrices, so it holds for the matrix C. That means there is some 
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orthogonal (n — 1) x (n — 1) matrix R such that R'C R = D, where D 
is a diagonal matrix. Consider the n x n matrix 


1 0! 
o=(9 R) 
This is an orthogonal matrix because the fact that R is orthogonal means 
columns 2, 3, ..., n of Q are mutually orthogonal and of length 1; and, 


furthermore, the first column is evidently orthogonal with all the other 
columns, and also has length 1. Let S = PQ. Then S is orthogonal 
because P and Q are: we have 


SAPO) 20 SOP a (ro) as. 
Now, let us think about STAS. We have: 
STAS =(PQ)'A(PQ) 
= O'P'APQ 
= Q'(P*AP)O 
= ic One a) a w) 
-0 R 0 C 0 R 
= G a) a a) q A 
AO: RT 0 C 0 R 
7 a o7 ) 
~ LO R'CR 
(ar oF 
-0 D?’ 
which is a diagonal matrix, because D is diagonal. So we’re done! We 


have established that there is an orthogonal matrix, S, such that STAS 
is diagonal. 


Activity 11.18 In order to understand this proof fully, work through it 
fora 2 x 2 symmetric matrix A. That is, assuming only that A has real 
eigenvalues, show that there is an orthogonal matrix P and a diagonal 
matrix D such that PTAP = D. 


11.2 Quadratic forms 


A very useful application of orthogonal diagonalisation is to the analysis 
of quadratic forms. 
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11.2.1 Quadratic forms 


A quadratic form in two variables x and y is an expression of the form 
q(x, y) = ax? + 2exy + by’. 
This can be written as 


q = x" Ax, 
b 


ea) x'=(x y). 


Activity 11.19 Check this. Perform the matrix multiplication x! Ax to 
see how the expression g(x, y) is obtained. Notice how the coefficients 
of the expression of g(x, y) correspond to the entries of A. 


: a c 
where A is the symmetric matrix A = (e ) and 


Of course, there are other ways of writing q(x, y) as a product of 
matrices, x! Bx, where B is not symmetric, but these are of no interest 
to us here; our focus is on the case where the matrix is symmetric. We 
say that q is written in matrix form when we express it as q = x! Ax, 
where A is symmetric. 


Activity 11.20 Find an expression for g(x, y) = x' Bx, where B is not 
symmetric. 


Here is a specific example of how a two-variable quadratic form can be 
expressed in matrix form. 


Example 11.21 The quadratic formg = x? + xy + 3y? in matrix form 


1S 
a= in a )G)- 


More generally, we consider quadratic forms in n variables. 


Definition 11.22 (Quadratic form) A quadratic form in n > 2 vari- 
ables is an expression of the form 


q = x" Ax, 


where A is a symmetric n x n matrix and x € R”. 


Example 11.23 The following is a quadratic form in three variables: 


q(x1, x2, X3) = 5x? + 10x2 + 2x? + 4xix2 + 2x1x3 — 6x2xX3. 
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In matrix form, it is x! Ax, where x = (x1, x2, x3)! and A is the sym- 


metric matrix 
5 2 1 
A= (2 10 =] ; 
1 —-3 2 


Activity 11.24 Check this. 


You should be able to write down the n x n symmetric matrix A from 
the expression of the quadratic form, and conversely, without having 
to multiply out the matrices. If you don’t already understand the cor- 
respondence between A and the expression g(x), X2,..., Xn) (that is, 
how to obtain one from the other by inspection), take a general 3 x 3 
symmetric matrix with (i, j) entry equal to a;; and calculate explicitly 
what the product x! Ax equals, where x = (x1, x2, x3)'. You will find 
that the diagonal entries of A are the coefficients of the corresponding 
ot terms, and that the coefficient of the x;x; term comes from the sum 
of the entries a;; and a;;, where a;; = aji, since A is symmetric. Try 
the following activity. 


Activity 11.25 As practice, write down an expression for g(x, y, z) = 
x' Bx, where x = (x, y, z)! and B is the symmetric matrix 


27 De Sl 
a=(2 7 ‘). 
Sia a 


11.2.2 Definiteness of quadratic forms 


Consider the quadratic form q(x, y) = x? + y?. For any choices of x 
and y, qı(x, y) > 0 and, furthermore, qı(x, y) = 0 only when x = y = 
0. On the other hand, the quadratic form g2(x, y) = x? + 3xy + y? is 
not always non-negative: note, for example, that g2(1, —1) = —1 < 0. 
An important general question we might ask (and one which has useful 
applications) is whether a quadratic form q is always positive (except 
when x = y = 0). Here, eigenvalue techniques help: specifically, we 
can use orthogonal diagonalisation. First, we need some terminology. 


Definition 11.26 Suppose that g(x) is a quadratic form. Then: 


e q(x) 1s positive definite if q(x) > 0 for all x, and q(x) = 0 only when 
x = 0, the zero-vector, 
e q(x) is positive semi-definite if q(x) > 0 for all x, 
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e q(x) is negative definite if q(x) < 0 for all x, and g(x) = 0 only 
when x = 0, the zero-vector, 

e q(x) is negative semi-definite if g(x) < 0 for all x, 

e q(x) is indefinite if it is neither positive definite, nor positive semi- 
definite, nor negative definite, nor negative semi-definite; in other 
words, if there are x;, X2 such that g(x;) < 0 and g(x2) > 0. 


Consider the quadratic form q = x' Ax, where A is symmetric, and 
suppose that we have found P that will orthogonally diagonalise A; that 
is, which is such that PT = P~! and PTAP = D, where D isa diagonal 
matrix. We make the (usual) change of variable as follows: define z 
by x = Pz (or, equivalently, z = P~'x = P'x). P is the transition 
matrix from coordinates in the orthonormal basis of eigenvectors of A 
to standard coordinates. Then 


q = X" Ax = (Pz)'A(Pz) =2z'(P'AP)z = 7' Dz. 


Now, the entries of D must be the eigenvalues of A: let us suppose 
these are (in the order in which they appear in D) Ay, A2,..., An. Let 
Z = (Z1, Z2,...,Zn)! be the coordinates in the orthonormal basis of 
eigenvectors. Then 


2 
Po 


q =z" Dz = MZ + A2 tee + Anz 
This is a linear combination of squares. 

Now suppose that all the eigenvalues are positive. Then we can 
conclude that, for all z, q > 0, and also that q = 0 only when z is 
the zero-vector. But because of the way in which x and z are related 
(x = Pz and z= P'x), x = 0 if and only if z = 0. Therefore, if all 
the eigenvalues are positive, the quadratic form is positive definite. Con- 
versely, assume the quadratic form is positive definite, so that x' Ax > 0 
for all x Æ 0. If, is a unit eigenvector corresponding to the eigenvalue 
Aj, then 


u Au; = u Aju, = Aiu u; = dil lu || = Xi > 0. 
So the eigenvalues of A are positive. Therefore, we have the first part 


of the following result. (The other parts arise from similar reasoning.) 


Theorem 11.27 Suppose that the quadratic form q(x) has matrix rep- 
resentation q(x) = x! Ax. Then: 


e q is positive definite if and only if all eigenvalues of A are positive, 

e q is positive semi-definite if and only if all eigenvalues of A are 
non-negative, 

° q is negative definite if and only if all eigenvalues of A are negative, 
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e q is negative semi-definite if and only if all eigenvalues of A are 
non-positive, 

e q is indefinite if and only if some eigenvalues of A are negative, and 
some are positive. 


Activity 11.28 Assume ann x n matrix A has 0 as one of its eigenval- 
ues and that all other eigenvalues are non-negative. Show that g = x! Ax 
is positive semi-definite but not positive definite. 


We say that a symmetric matrix A is positive definite if the corre- 
sponding quadratic form q = x! Ax is positive definite (and, similarly, 
we speak of negative definite, positive semi-definite, negative semi- 
definite, and indefinite matrices). 

As a consequence of Theorem 11.27, in order to establish if a matrix 
is positive definite or negative definite, we only need to know the signs 
of the eigenvalues and not their values. It is possible to obtain this 
information directly from the matrix A. 

We first examine the case where A is a symmetric 2 x 2 matrix, 


a c 
A= : 
a 
Let à; and Az be the eigenvalues of a matrix A whose characteristic 
equation is 
a—À c 


|A—AI|= ny, 


|= 3 (a +b + (abc?) =0. 


Since the eigenvalues A; and Az are the roots of the characteristic 
equation, we have 


|A —AI| = (A — AMA — Ay) = à? — (Ap FADIA + AAD = 0. 
Comparing terms of these two polynomials in A, we have 
Ay =ab—c* and Ay tA2 =a +b. 


These observations are, in fact, simply special cases of Theorem 8.1 | 
and Theorem 8.15: namely, that the determinant of A is the product of 
the eigenvalues (explicitly, in this case, ab — c? = A,A2); and that the 
trace of A is the sum of the eigenvalues (a + b = A; + Az). 

If |A| = ab — c? > 0, then both eigenvalues 41, 42 have the same 
sign (since their product, which is equal to | A], is positive). Since also 
a and b must have the same sign in this case (since ab > c? > 0), we 
can deduce the signs of the eigenvalues from the sign of a. 

Consider the following example: 
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Example 11.29 Let A be the following matrix: 


A= (> zs) 


Then (because it is symmetric) A has real eigenvalues 4, 42 which 
must satisfy the following equations: 


AA = |A| = 9(6) — (—2)(-2) = 50, and Ay +2 =9+6 = 15. 


Since A,A2 > 0, the eigenvalues are non-zero and have the same sign. 
Since 4; + Az > 0, both must be positive. Therefore, the matrix A is 
positive definite. (In fact, the eigenvalues are 5 and 10, but we do not 
need to do the extra work of finding them explicitly if we only want to 
know about their signs.) 


In a similar way, if |A| = ab —c* < 0, then the eigenvalues have 
opposite signs and the form is therefore indefinite. So we can 
conclude: 


e If|A| > Qanda > 0,thendA, > 0, Az > Oand A is positive definite. 
«e If|A| > Oanda < 0,thendA, < 0, Az < Oand A is negative definite. 
e If|A| <0, then A; and A, have opposite signs and A is indefinite. 


If |A| = 0, we conclude that one of the eigenvalues is 0. 
This kind of test on the matrix can be generalised to an n x n 
symmetric matrix A. But first we need a definition. 


Definition 11.30 If A is ann x n matrix, the principal minors of A are 
the n determinants formed from the first r rows and the first r columns 
of A, forr = 1,2,...,n; that is, 


ai] 12 Qin 
a a a 
dı) an 1 12 13 a21) 42 Q2n 
a a2; An a2 . 
ll; a21 an bi 3 x ’ 
431 432 433 
dni An2 *** Ann 


Notice that the n x n principal minor is just the determinant of A. If, 


for example, 
5S 2 1 
A= (2 10 4) ; 
1 -3 2 
then the principal minors are 
Se ae 
a1 = 5, F 10 


Activity 11.31 Check these determinant calculations. 


|=46, |A| =25. 
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Notice that all three principal minors are positive. In fact, this is 
enough to show that A is positive definite, as stated in the following 
result. 


Theorem 11.32 Suppose that A is ann x n symmetric matrix. Then A 
is positive definite if and only if all its principal minors are positive. 


We will prove this result in the next section. For now, let’s assume it is 
true and look at some of the consequences. 

Theorem | 1.32 gives us a test to see if a matrix is positive definite. 
What about the other possibilities? 

A matrix A is negative definite if and only if its negative, — A, is 
positive definite. (You can see this by noting that the quadratic form 
determined by — A is the negative of that determined by 4.) Now, if 4, 
is any r x r matrix, then | — A,| = (—1Y |A]. 


Activity 11.33 Show this using properties of the determinant. 


So ifr is even, the r x r principal minor of A (the principal minor 
of order r) and that of — A have the same sign, and if r is odd, they 
have opposite signs. If —A is positive definite, Theorem 11.32 tells us 
that all of its principal minors are positive. So we have the following 
characterisation of a negative definite matrix. 


Theorem 11.34 Suppose that A is ann x n symmetric matrix. Then A 
is negative definite if and only if its principal minors of even order are 
positive and its principal minors of odd order are negative. 


Another way of stating this is: the symmetric x n matrix A is negative 
definite if and only if its principal minors alternate in sign, with the first 
one negative. 


Activity 11.35 Convince yourself that these two statements are 
equivalent. 


If Ais ann x n symmetric matrix which is neither positive nor negative 
definite, and if |A| 4 0, then A is indefinite because we can conclude 
that A has both positive and negative eigenvalues. If | A| = 0, the only 
thing we can conclude is that one of the eigenvalues is 0. These state- 
ments follow from Theorem 8.11, which states that if A has eigenvalues 
Àl, À2, +, Àn, then 


[A| = AA Ae 


Activity 11.36 Explain why Theorem 8.11 establishes the following: if 
Aisann x n symmetric matrix and | A| = 0, then one ofthe eigenvalues 
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is 0; and if A is neither positive nor negative definite and |A| 4 0, then 
A has both positive and negative eigenvalues and is therefore indefinite. 


It should be noted that there is no test quite as simple as those of 
Theorem | 1.32 and Theorem | 1.34 to check whether a matrix is positive 
or negative semi-definite. Consider the following example. 


Example 11.37 Let A be the matrix 


1 1 0 
(1 1 o). 
0 0 ż 


Solving |4 — AJ| = 0, we find that the eigenvalues are 0, 2, t. The 
principal minors of A are 

1 1 
| 1 1 
But ¢ can be any real number, either positive or negative. So in this case 
the principal minors are no indication of the signs of the eigenvalues. 


ay, =1, |=0, |A| = 0. 


11.2.3 The characterisation of positive-definiteness 


Before we embark on this proof, there are two general results which we 
will need, so we state them and prove them now. The first is really an 
observation: 

If D is ann x n diagonal matrix with positive entries on the diag- 
onal, then D is positive definite. 


Activity 11.38 Prove this. 
The second we will state as a theorem. 


Theorem 11.39 /f A and B are any n x n symmetric matrices such 
that EAE" = B for an invertible matrix E, then A is positive definite 
if and only if B is positive definite. 


Proof: To see this, assume B is positive definite. Let x € IR” and let 
y = (ET) !x (or, equivalently, x = E'y). Then x = 0 if and only if 
y = 0. Since B is positive definite, we have, for all x 4 0, 
x’ Ax = (E'y)' A(Ely) = y'EAE'y = y' By > 0, 
so A is also positive definite. The converse follows immediately by 
noting that 
A= E'B(E')!=E'B(E"!)'=FBF', where F=E", 

so if A is positive definite then so is B. 
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Proof of Theorem 11.32 To prove this, we (once again) need to use an 
inductive proof on the size n of ann x n symmetric matrix A. We have 
already shown that the result is true for a 2 x 2 matrix by looking at 
the trace and the determinant of the matrix. We will prove this again in 
a different way so that we can extend the proof to n x n matrices. You 
can safely omit this proof. However, we include it for completeness and 
because it uses ideas from earlier chapters, namely row operations and 
elementary matrices. 

We want to show that a symmetric matrix A is positive definite if 
and only if all its principal minors are positive. 

We will first prove the difficult part of the ‘if and only if’ statement: 
assuming that all the principal minors of the matrix A are positive, 
we will show that A is positive definite. First, we do this for a 2 x 2 
matrix, and then we show how this implies the statement for a 3 x 3 
matrix. After that, assuming that the result is true for (n — 1) x (n — 1) 
matrices, it is not difficult to show it is true for n x n matrices. The 
main idea in this proof is to use carefully chosen row (and column) 
operations to diagonalise the matrix A. 

Let A be a2 x 2 symmetric matrix, 


a c 
ta a a,b,ceER 


with positive principal minors. Then a > 0 and |A| > 0. We perform 
the following row operation on A by multiplying it on the left by an 
elementary matrix: 


BA=(_ ya) 1) (e s) =(0 sa = (0 dija) 


We then perform the analogous column operation on this, by multiplying 
on the right by the matrix ET: 


rae = (6 Vike: y= (6 Aya) => 


It turns out that the diagonal matrix E AET = D has the same principal 
minors as the matrix A. For the first principal minor of D (which is just 
the (1, 1) entry) is equal to a, since this was unchanged; and the 2 x 2 
principal minor of D is |D|, where 


|D| = |EAE*| = |E| |A| |ET| = |A|, since Z| = |E" = 1. 


Note that, as a consequence of our method, the diagonal entries of 
D = (d;;) are 


_ _ Hl 
dii =a and dx = —. 
a 
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So D = EAE’ isa diagonal matrix with all its diagonal entries positive. 
Therefore, D is positive definite, and by Theorem 11.39, A is positive 
definite. 

In order to continue, we introduce some notation. If A is ann x n 
matrix, let A,x, denote the r x r matrix consisting of the first r rows 
andr columns of A. Then the principal minors of A are 


ay = |Arxal, |A2xal, |A3xal, or) |Anxal = |A]. 


The idea of this proof is to reduce the n x n matrix A to a diagonal 
matrix in the same way as we did for the 2 x 2 matrix, using only row 
operations which add a multiple of one row to another (and also by 
using corresponding column operations). An elementary matrix which 
corresponds to this type of row operation is, of course, invertible, and, 
most importantly, it has determinant equal to 1. 

Let £;; denote the elementary matrix that performs the row opera- 
tion: ‘row i — (ai1/a11) row 1’, where the size of this elementary matrix 
will depend on the size of the matrix on which the row operation is 
being performed. For example, if A is a 2 x 2 matrix, then En; is just 
the matrix E we used above. 

If A isa3 x 3 matrix, then, for instance, A is 


1 0 0 di1] 412 413 
—(a21/a11) 1 O} |an an az 
0 0 1 431 432 433 
a1] a12 413 
=| 0 aņ-—(anan/an) * |}, 
431 432 433 


where we have written x to indicate the (2, 3) entry in E21 A. Notice that 
the (2, 2) entry of E21 A is already equal to the determinant of the 2 x 2 
principal minor of A divided by the entry a1, so the 2 x 2 principal 
minors of the matrices £214 and A are the same. 

We now show how the 2 x 2 result implies that the theorem is 
also true for a 3 x 3 matrix. We apply the elementary row and column 
operations (as indicated above) to A to reduce the 2 x 2 principal minor 
to diagonal form: 


dıl 0 * 
En AE =| 0 d2 *}), 
x x x 


so that the first two principal minors of the matrix A and the matrix 
E2; AE}, are equal. Then dy) = |42x2|/a11. We now continue reduc- 
ing the matrix to diagonal form using the same type of elementary row 
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operations (adding a multiple of one row to another) and the correspond- 
ing column operations. All of the elementary matrices which perform 
this type of operation have determinant equal to 1. We have 


dil 0 0 
E3, E3, E2 AEJ Ey, Ey = | 0 dn 0 = D, 
0 0 d3 


where £3 is the elementary matrix that performs the (obvious) row 
operation needed to complete the diagonalisation. Now we already 
know that the first two principal minors of these matrices are equal, and 
since 


|D| = |E32. E31 E21 AE}, E31 E32 
= |Ez|E3 |En All Eo E3 lE321 = | Al, 
all the principal minors are equal. 


In addition, since each principal minor of D is just the product of 
the diagonal entries, we can deduce that the entries of D are 
_ Aal ae |A3x3| 


dij =a, dy , 43 = ; 
ayy |A2x2l 


For we know from above that dı; = a1; and dz? = |A2x2|/4a11; and the 
fact that d33 takes the value indicated then follows directly from the 
observation that 


|D| = di1d22d33 = |A| = | A3x3]. 


Since the diagonal entries of D are positive, we conclude as earlier that 
D is positive definite, and therefore A is positive definite. 

We are now ready to consider an n x n symmetric matrix A, 
assuming the result is true for any (n — 1) x (n — 1) matrix. We 
apply the elementary row and column operations to A to reduce the 
(n — 1) x (n — 1) principal minor to diagonal form, and then continue 
to reduce the matrix A so that we obtain a matrix 


EAE" = diag(di1, dy, ...,dnn), 


where we used E to denote the product of the elementary matrices which 
achieve this diagonalisation. All of these elementary matrices have 
determinant equal to 1, and therefore so does Æ. The method ensures 
(by the underlying assumption about the (n — 1) x (n — 1) case) that 
the first n — 1 principal minors of the matrices A and D are the same. It 
only remains to show that |4| = |D|, which follows immediately from 


|D| = |EAE"| = |E||Al|E"| = |A]. 


350 Orthogonal diagonalisation and its applications 


Therefore, using the same arguments as earlier, D is a diagonal matrix 
with positive entries along the diagonal. Therefore D is positive definite, 
and so therefore is A. 

To complete the proof, we show that if the quadratic form is pos- 
itive definite, then all the principal minors are positive. Recall that the 
principal minors of A are 


ay =|Aixil, [A2x2l, [43x31 -<--> Anxa] = IAL, 


where A,,., denotes the r x r matrix consisting of the first r rows and 
r columns of A. We will prove that 4,.,. is a positive definite r x r 
matrix forr = 1,2,...,” — 1. We already know this to be the case for 
r =n. It will then follow, by Theorem 11.27, that the eigenvalues of 
Arxr are all positive. Then Theorem 8.11 will tell us that |4,x+| is the 
product of these positive eigenvalues, and is therefore positive. So, let’s 
show that 4,,; is positive definite, using the fact that A is. 

We know that, for all y € R”, y' Ay > 0, unless y = 0. Fix r, a 
number between 1 and n — 1. Let x = (x1, X2, ..., Xp)! € R” and let 
X, = (x1,X2,...,X,,0,0,...,0)' € R” be the n-vector with first r 
entries the same as x, and all other entries zero. 

Suppose x is a non-zero vector. Then so is x, and we must have 
x) Ax, > 0. But 


ail Ai? tt Alp ttt xX} 

a2) 422, +++ ay : 

z a31 432, tt 3p ttt Xp 
x, Ax, = (xı x, 0 OJ - i ; D 
arı dr2 ~e Arr : 

: ‘ à 0 


Think about how this product evaluates. It is a 1 x 1 matrix. Because 
of the zero entries in x,, and because of the way in which matrix 
multiplication works, we have, for x = (x1, x2, ..., xX)! € R’, 


T T 
X, ÁX, =X ÁrxrX. 


So, since A is positive definite, for all x € R”, with x 4 0 


xT A,x X = x, Ax, > 0. 


So, indeed, 4, is positive definite, as required. 


Figure 11.1 The 
graphs of (left) 

x? — 2y? = 2 and 
(right) y? — 2x? = 2 
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11.2.4 Quadratic forms in R?: conic sections 


Conic sections are traditionally described as curves which can be 
obtained as the intersection of a plane and a double cone, such as a 
circle, ellipse, parabola or hyperbola. It is more common, however, to 
think of them as defined by certain types of equation and here there 
is a very useful link to quadratic forms. The technique of orthogonal 
diagonalisation enables us to determine what type of conic section we 
have, and to sketch them accurately. 

If A is a 2 x 2 symmetric matrix, the equation xTAx = k, where k 
is a constant, represents a curve whose graph in the xy-plane is a conic 
section. For example, the equation 


represents an ellipse which intersects the x axis at (—a, 0) and (a, 0), 
and intersects the y axis at (0, b) and (0, —b). If a = b, this is a circle 
of radius a. These curves are said to be in standard position relative to 
the coordinate axes (meaning that their axes of symmetry are the x axis 
and the y axis), as are the two hyperbolas whose graphs are shown in 
Figure 11.1. 

The graphs of each of the hyperbolas x? — 2y? = 2 and y? — 2x? =2 
are shown in this figure, together with the two straight lines which are 
the asymptotes of the hyperbola. From each equation, we see that if 
x is large, then y must also be large, so that the difference in the 
squared terms remains constant. For example, for the first hyperbola, 
x? — 2y? = 2, the asymptotes can be easily found by rewriting this 
equation as 
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Figure 11.2 The 
ellipse x?+ 2y? = 2 


As x gets very large, 1/x? — 0, so the points on the hyperbola approach 
the lines given by y? = x?/2. The asymptotes are therefore 
1 d 1 
= ——x an = ——=x. 
Y J2 Y V2 
Activity 11.40 Find the equations of the asymptotes for the hyper- 
bola y? — 2x? =2. 


On the other hand, in the equation of an ellipse, such as 
x? 42y? =2, 


the values of x and y are constrained by the equation: the largest value 
that x can obtain is when y = 0. 

If A is a diagonal matrix, so that the equation x! Ax = k has no 
xy term, then this equation represents a conic section in standard 
position and it is straightforward to sketch its graph. For example, 
if A = diag(1, —2), then the graph of x'Ax = x? — 2y? = 2 is the 
hyperbola shown on the left in Figure 11.1, whereas if A = diag(1, 2), 
then the graph of x’ Ax = x? + 2y? = 2 is an ellipse which intersects 
the x axis at x = +4/2 and the y axis at y =+1. This is shown in 
Figure 11.2. 

But how do we sketch the graph if A is not diagonal? To achieve this, 
we can use orthogonal diagonalisation. We illustrate the method using 
the following example (which you have seen before as Example 7.40). 


Example 11.41 Consider the curve C with equation 
5x? — 6xy + 5y? =2. 


In matrix form, this equation is 


x! Ax =(x aa 7) A = 2, 


where A is symmetric. We orthogonally diagonalised the matrix A in 
Example 11.4. We found that the eigenvalues are 4; = 2 and Az = 8, 
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with corresponding eigenvectors 


al) mace ee 


Now, if P is the matrix 


then 
P'AP = P'AP = D = diag(2, 8). 
Activity 11.42 Check these calculations. 


We set x = Pz, and interpret P both as a change of basis and as a 
linear transformation. The matrix P is the transition matrix from the 
(orthonormal) basis B of eigenvectors, B = {v,, v2}, to the (orthonor- 
mal) standard basis. If we let z = (X, Y)", then, as we saw earlier, 
the quadratic form — and hence the curve C — can be expressed in the 
coordinates of the basis of eigenvectors as 


x’ Ax =z) Dz =2X* + 8Y? =2; 


that is, as X? + 4Y* = 1. This is an ellipse in standard position with 
respect to the X and Y axes. 

But how do we sketch this? We first need to find the positions 
of the X and Y axes in our xy-plane. If we think of P as defining a 
linear transformation T which maps R? onto itself, with T(x) = Px, 
then the X and Y axes are the images of the x and y axes under the 
linear transformation 7. Why? The positive x axis is described as all 
positive multiples of the vector e, = (1, 0)', and the positive y axis as 
all positive multiples of e2 = (0, 1)'. The images of these vectors are 


T(e) =v; and T(e2)= v2. 


Analogous descriptions of the X and Y axes are that the positive X 
axis is described as all positive multiples of the vector [1, 0], and the 
positive Y axis as positive multiples of [0, 1]z. But these are just the 
coordinates in the basis B of the vectors vı and v2, respectively. 

This allows us to draw the new X and Y axes in the xy-plane as the 
lines in the directions of the vectors vı and v2. 

In this example, the new X, Y axes are a rotation (anticlockwise) of 
the old x, y axes by 2/4 radians. We looked at rotations in Section 7.1.3, 
where we showed that the matrix representing a rotation anticlockwise 


by an angle 0 is given by 
( cos@ —sin@ ) 
sind cos j` 
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In fact, in this example we carefully chose the column positions of the 
eigenvectors so that P would define a rotation anticlockwise, and it is 
always possible to do so. Why is that? Well, an orthonormal basis of R? 
consists of two unit vectors which are orthogonal. Suppose one of the 
vectors is u; = (u1, u2)! with u; > 0 and u2 > 0, then the other vector 
must be either uz = (—u2, u1)! or —u, and we can choose up. If 


uy; —u 
P= ( 1 >) l 
u2 ui 
then P is the matrix of a rotation anticlockwise, since it is possible to 
find an angle 0 such that cos 0 = u; and sin 0 = wp. 


Activity 11.43 Think about why these two assertions are true: why is 
Uy (or —uy) the second vector in the orthonormal basis, and why is it 
possible to find such an an angle 0? 


By choosing to write the unit eigenvectors as the columns of P in 
this way, it is easy to find the positions of the new axes because we 
can recognise the linear transformation as a rotation anticlockwise. 
However, any choice of P would still enable us to sketch this graph (see 
Exercise 11.8). 

Continuing our example, we are now in a position to sketch the 
graph of C in the xy-plane. First draw the usual x and y axes. The 
positive X axis is in the direction of the vector (1, 1)! and the positive 
Y axis is along the direction of the vector (—1, 1)'. These new X, Y 
axes are a rotation of 2/4 radians anticlockwise of the old x, y axes. 
So we draw the X and Y axes along the lines y = x and y = —x. We 
now sketch the ellipse X? + 4Y? = 1 in standard position with respect 
to the X and Y axes. It intersects the X axis at X = +1 and the Y axis 
at Y = +1/2. See Figure 11.3. 


Activity 11.44 Where does the curve C intersect the x and y axes? 


Figure 11.3 The 
ellipse 
5x? —6xy+5y*=2 
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You should be asking another question about this method. How do we 
know that the linear transformation defined by P did not change the 
shape of the curve? 

It turns out that a linear transformation given by an orthogonal 
matrix P, P(x) = Px, is a rigid motion of the plane: nothing changes 
shape because the linear transformation preserves both lengths and 
angles. Such a linear transformation is called an isometry. In order to 
prove this assertion, note that both the length of a vector v and the angle 
between two vectors v, w are defined in IR” by the inner product, 


V,Vv 
ivil = yv (v, v) cos = ANTE. 
viL iwi] 

So we only need to show that P preserves the inner product. We have 
the following general result: 


Theorem 11.45 The linear transformation defined by an orthogonal 
matrix P preserves the standard inner product on R”. 


Proof: If the linear transformation defined by P is denoted by 
T : R” > R”, then T(x) = Px. Let v, w € R”. Then, taking the inner 
product of the images, we have 


(Pv, Pw) = (Pv)'(Pw) = v! P" Pw = vw = (y, w). 


The inner product between two vectors is equal to the inner product 
between their images under P. 


Therefore, length and angle are preserved by such a linear transforma- 
tion, and hence so also is the shape of any curve. This validates our 
method. 


11.3 Learning outcomes 


You should now be able to: 


e know what is meant by orthogonal diagonalisation 

e explain why an n x n matrix can be orthogonally diagonalised if 
and only if it possesses an orthonormal set of n eigenvectors 

e orthogonally diagonalise a symmetric matrix and know that only 
symmetric matrices can be orthogonally diagonalised 

e know what is meant by a quadratic form and what it means to say 
that a quadratic form or a symmetric matrix is positive definite, 
positive semi-definite, negative definite, negative semi-definite and 
indefinite; and be able to determine which of these is the case 

e use orthogonal diagonalisation to analyse conic sections. 
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11.4 Comments on activities 


Activity 11.11 Check that the inner product of any two vectors is equal 
to 0. 


Activity 11.13 You should obtain that PTP = J. 


Activity 11.16 The reduced row echelon form shows that a basis of the 
row space of B — I is given by the vector v3 = (1, 1, 1)'. Since the 
row space and the null space of a matrix are orthogonal, this vector v3 
is orthogonal to every vector in the null space of B — J, which is the 
eigenspace of A = 1. Therefore, v3 must be an eigenvector for the third 
eigenvalue, à = 4, and this can be easily checked by finding Bv3. 


Activity 11.17 You should check that u -L u. You should also check 
that u2 -L u; to show that it is in the eigenspace for A = 1. 


Activity 11.18 Let A be a 2 x 2 symmetric matrix. Then by Theo- 
rem 11.7, A has real eigenvalues. Let 4; be an eigenvalue of A, and let 
vı by a corresponding unit eigenvector, so Av, = àıvı and ||v;|| = 1. 
Extend {v;} to a basis {v1, x2} of R?, then using Gram-Schmidt (start- 
ing with vı which is a unit vector) make this into an orthonormal 
basis B = {v,, V2} of R?. Let P be the matrix whose columns are 
the vectors in B. Then P is the transition matrix from B coordinates 
to standard coordinates, and P is orthogonal by Theorem 10.18. By 
Theorem 7.37, the matrix P~'4 P = P' AP represents the linear trans- 
formation T(x) = Ax in the basis B. By Theorem 7.36, the first column 
of P' AP will be the coordinate vector of T(v,) with respect to the basis 
B. Now, T(v1) = AV; = A1Vj, so this coordinate vector is 


fol, 


P'AP = & a) 
0 a2 ` 


Then 


But the matrix PTAP is symmetric, since 
(P'AP)' = P'A'P = P'AP, 
so it must be of the form 


Pa Jay 
0 a2 


Therefore, A can be orthogonally diagonalised. 
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Activity 11.20 For example, let 


Activity 11.25 g(x, y, Z) = 3x? + 7y? — 52? + 4xy — 2xz + 8yz. 


Activity 11.28 Let the eigenvalues of A be 4; = 0, Ao, ..., An, A; = 9, 
and let vı be an eigenvector corresponding to A; = 0. Then v] Av; = 0 
so x! Ax is not positive definite. But 


x) Ax =z'Dz= doz? +-+ er > 0, 


since it is the sum of non-negative numbers, so x! Ax is positive semi- 
definite. 


Activity 11.33 If a row (or column) of a matrix A is multiplied by a 
constant, then the determinant of A is multiplied by that constant. Since 
the rth principal minor has r rows and each is multiplied by — 1 to form 
| — A,|, we have | — A,| = (—1)’|A,|. 


Activity 11.36 We have |A| = A,A2---A,. If |A| = 0, one of these 
factors (that is, one of the eigenvalues) must be 0. On the other hand, 
if |A| 4 0, then none of these factors is 0. If A is neither positive nor 
negative definite, then A must have some positive eigenvalues and some 
negative eigenvalues, so A must be indefinite. 


Activity 11.38 Lety € R”. Then y! Dy is a sum of squares with positive 
coefficients, therefore y' Dy > 0 for all y 4 0, and y' Dy = 0 if and 
only ify = 0. 


Activity 11.40 These are y = +V/2x. 


Activity 11.43 Certainly (u;, u2) = 0, and each is a unit vector, so 
{u,, u2} is an orthonormal basis of R*. (You can show that any vector 
a = (a, b)! such that uia + uzb = 0 must be a scalar multiple of up 
and there are only two such unit vectors, namely +up.) 

The angle 0 is defined by the two equations, cos 0 = u; and sin 0 = 
u2. This works since 


cos’ 6 + sin? @ = u? + uy = 1. 
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Activity 11.44 These points are found using the original equation, 
5x? — 6xy + 5y? = 2. If x = 0, y = ,/2/5, and, similarly, if y = 0, 
x= aa 275. 


11.5 Exercises 


Exercise 11.1 Orthogonally diagonalise the matrix 


7 0 9 
a=(0 2 o). 
9 0 7 


Why do you know this can be done before even attempting to do it? 


Exercise 11.2 Let 


z3 4 2 1 
a=(1 2 aT w= (1), 
D 2 j 0 


Show that v; is an eigenvector of A and find its corresponding eigen- 
value. Find a basis of the eigenspace corresponding to this eigenvalue. 
Orthogonally diagonalise the matrix A. 


Exercise 11.3 Prove that the following quadratic form 
q(x, y, Z) = 6xy — 4yz + 2xz — 4x? — 2y? — 42? 


is neither positive definite nor negative definite. Is it indefinite? 
Determine whether the quadratic form 


f(x, y, Z) = 2xy — 4yz + 6xz — 4x? — 2y? — 42? 
is positive definite, negative definite or indefinite. 
Exercise 11.4 Consider again the matrix A in Exercise 11.1. Express 
the quadratic form f(x, y, z) = x' Ax as a function of the variables x, 


y and z. Write down a matrix Q so that ifx = Oz with z = (X, Y, Z)', 
then 


fœ, y, z) = ZDZ = A + 2Y? + 432Z, where A; > Az > Aj. 


Is the quadratic form f(x, y,z) positive definite, negative definite or 
indefinite? 
Find, if possible, a vector a = (a, b, c) such that f(a, b,c) = —8. 


Exercise 11.5 Prove that the diagonal entries of a positive definite 
n Xn matrix A must be positive numbers. (Do this by considering 
e! Ae;, where {e1, €2,..., €,} is the standard basis in R”.) 
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Give an example to show that the converse statement is not true; that is, 
write down a symmetric matrix which has positive numbers along the 
diagonal but which is not positive definite. 


Exercise 11.6 Let B be anm x k matrix with full column rank (mean- 
ing rank(B) = k). Show that the matrix BTB is a positive definite 
symmetric matrix. 

Show also BTB is invertible, by proving that any positive definite 
matrix is invertible. 


Exercise 11.7 Consider the quadratic form 
f(x,y, Z) = x? — 4xy + 5y? — 2xz + 6yz + 22°. 


Write down a symmetric matrix A such that f(x, y, z) =x! Ax for 
x = (x, y,z)!. Is the matrix A negative definite, positive definite or 
indefinite? 

Is there a vector a = (a, b, c)! such that f(a, b, c) < 0? Investigate 
carefully and justify your answer. Write down such a vector if one exists. 


Exercise 11.8 Sketch the curve: 5x? — 6xy + 5y? = 2, by reworking 
Example 11.41, this time choosing Q to be the orthogonal matrix 


o-( F i 


Exercise 11.9 Express the quadratic form 9x? + 4xy + 6y? as x’ Ax, 
where A is a symmetric 2 x 2 matrix, and find the eigenvalues of A. 
Deduce whether the quadratic form is positive definite or otherwise, 
and determine what type of conic section is given by the equation. 


9x? + 4xy + 6y? = 10. 


Orthogonally diagonalise the matrix A and use this information to 
sketch the curve 9x? + 4xy + 6y? = 10 in the xy-plane. 


Exercise 11.10 Show that the vectors vı = (1, 1)! and vz = (—1, 1)! 
are eigenvectors of the symmetric matrix 


What are the corresponding eigenvalues? 


Exercise 11.11 Sketch the curve x? + y? + 6xy = 4 in the xy-plane. 
Find the points of intersection with the old and new axes. 
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11.6 Problems 


Problem 11.1 Find the eigenvalues and corresponding eigenvectors of 
the matrix 


Find an orthogonal matrix P and a diagonal matrix D such that 
PTAP =D. 


Problem 11.2 Consider the matrix B and the vector vı, where 


1 1 0 1 
s=; 4 s). w=(s]. 
0 3 1 3 


Show that v; is an eigenvector of B and find its corresponding eigen- 
value. 
Orthogonally diagonalise the matrix B. 


1 -4 2 
A= (= 1 22) ‘ 
2 =2 -2 


Find the eigenvalues of A and for each eigenvalue find an orthonormal 
basis for the corresponding eigenspace. 
Hence find an orthogonal matrix P such that 


PAP = PAP = D. 
Write down D and check that PTAP = D. 


Problem 11.3 Let 


Problem 11.4 Consider the matrix A and the vector vı, where 


3e =2 l —1 
a=- 6 2) v=o] 
lL ' ee" 3 1. 


(a) Show that v is an eigenvector of A and find its corresponding 
eigenvalue. 

Given that à = 8 is the only other eigenvalue of A, how do you 
know that the matrix A can be diagonalised before attempting to 
do so? 

Diagonalise the matrix A; that is, find an invertible matrix P 
and a diagonal matrix D such that P~!AP = D. 

Then orthogonally diagonalise A; that is, find an orthogonal 
matrix Q such that O'AO = D. 
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(b) Write out an expression for the quadratic form 


Z 


X 
f(x, y,z) =x' Ax, where x= C) , 


in terms of x, y and z. Is the quadratic form f positive definite, 
negative definite or indefinite? 

Write down a basis B, and numbers 41, 42,43 such that f can 
be expressed as 


SEY, z) = MX + AY? +32’, 


where X, Y, Z are coordinates in the basis B Write down the 
transition matrix from coordinates in this basis B to standard 
coordinates. 

Evaluate f(x,y,z) at one unit eigenvector corresponding to 
each eigenvalue. 


Problem 11.5 Suppose x! Ax is a quadratic form, and å is an eigenvalue 
of A. If u is a unit eigenvector corresponding to 4, show that 


ul Au =À. 


Problem 11.6 Consider the following matrix A and vector v: 


—5 1 2 -1 
DE -5 2). E 
2 2 ead 0 


Find a basis of the null space of A, N(A). For this matrix A, why is 
the row space equal to the column space, RS(A) = CS(A)? Show that 
CS(A) is a plane in R°, and find its Cartesian equation. 

Show that v is an eigenvector of A and find the corresponding 
eigenvalue A. Find all eigenvectors corresponding to this eigenvalue. 
Orthogonally diagonalise A. 

What is the relationship between the column space of A and the 
eigenspace corresponding to the eigenvalue of multiplicity 2? Why 
does this happen for this particular matrix A? 


Problem 11.7 Determine whether either of the following quadratic 
forms is positive definite: 


F(x, y) = 3x? — 8xy + 3y’, G(x, y) = 43x? — 48xy + 57y’. 
Find, if possible, points (a;, b;), (i = 1,2, 3,4), such that 
F(a, bı) > 0, F(a, b2) < 0, G(a3, b3) > 0, G(a4, b4) < 0. 
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Problem 11.8 


(a) Express the following quadratic form as x' Ax where A is a sym- 
metric matrix: 
f(x,y, z) = 6yz — x? +2xy — 4y? — 6z2. 


Determine whether the quadratic form is positive definite, negative 
definite or indefinite. 
(b) Do the same for the quadratic form 


g(x, y, Z) = 6yz — x? + 2xy — 4y’. 
Problem 11.9 Find a symmetric matrix A such that the quadratic form 
f(x, YZ) = 3x7 + 4xy + 2y? + 52? — 2xz + 2yz 
can be expressed as x! Ax. Determine the signs of the eigenvalues of A. 


Problem 11.10 Let A be a positive definite symmetric n x n matrix. 
Show that the mapping from pairs of vectors x, y € R” to the real 
numbers defined by 


(x, y) =x" Ay 


defines an inner product on R”, where the 1 x 1 matrix x! Ay is inter- 
preted as the real number which is its only entry. 


Problem 11.11 Let à be a constant and let 


| -3 
s=- s) 
2 ì 


For what value(s) of à will the matrix BTB be invertible? For what 
value(s) of à will the matrix BTB be positive definite? Justify your 
answers. 


Problem 11.12 If A is anm x k matrix, show that the matrix ATA can 
never be negative definite. 


Problem 11.13 Orthogonally diagonalize the matrix 


1 2 
a=(; a 


and use this to sketch the curve x! Ax = 3 in the xy-plane. 
Find the points of intersection of the curve with the old and the new 
axes. 


Problem 11.14 Let C be the curve defined by 
3x? + 2V3xy + 5y? =6. 
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Find a symmetric matrix A such that C is given by x! Ax = 6. 

Find an orthogonal matrix P and a diagonal matrix D, such that 
PTAP =D, and such that the linear transformation T : R? > R? 
defined by T(x) = Px is an anti-clockwise rotation. Use this to sketch 
the curve in the xy-plane, showing the old and new axes on your 
diagram. Compare this with Problem 7.11, where the same curve was 
sketched using a different linear transformation. 


Problem 11.15 Sketch the curve 


3x? + 4xy + 6y? = 14 


in the xy-plane. 
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Direct sums and projections 


In this chapter, we meet several important new ideas: direct sum, orthog- 
onal complement and projection. These are useful in the theoretical 
study of linear algebra, but they also lead us to a very useful practical 
solution to a real-life problem, namely that of finding the ‘best fit’ of a 
particular type to a set of data. 


12.1 The direct sum of two subspaces 


A very useful idea is the sum of two subspaces of a vector space. A 
special case of this, a direct sum, is of particular importance. 


12.1.1 The sum of two subspaces 


For subspaces U and W of a vector space V, the sum of U and W, 
written U + W, is simply the set of all vectors in V which are obtained 
by adding together a vector in U and a vector in W. 


Definition 12.1 If U and W are subspaces of a vector space V, then 
the sum of U and W, denoted by U + W, is the set 


U+W={u+wliuewu, we W}. 
The sum U + W is also a subspace of V. 
Activity 12.2 Prove that U + W isa subspace of V. 


Note the difference between U + W and U U W, which is the set that 
contains all vectors that are in U or in W. The set UU W is not 
generally a subspace, since if, say, u € U, u ¢ W andwe W, w ¢ U, 
then u + w ¢ U U W; in which case it is not closed under addition (see 
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Exercise 5.6). The subspace U + W contains both U and W, but is 
generally much larger. 

In fact, U + W is the ‘smallest’ subspace of V that contains both 
U and W. By this, we mean that if S is a subspace of V and we have 
both U C Sand W C S, then U + W C S. To see this, we can simply 
note that for any u € U and any w € W, we will have u € S and w € S 
and so, because S is a subspace, u + w € S. This shows that any vector 
of the form u + w is in S, which means that U + W CS. 


Activity 12.3 Suppose that u, w € R”. Prove that 


Lin{u} + Lin{w} = Lin{u, w}. 


12.1.2 Direct sums 


A sum of two subspaces is sometimes a direct sum. 


Definition 12.4 A sum of two subspaces U + W is said to be a direct 
sumif U ONW = {0}. 


That is, the sum is direct if the intersection of U and W is as small as 
it can be. (Since both are subspaces, they will both contain 0; the sum 
is direct if that is all they have in common.) When a sum of subspaces 
is direct, we use the special notation U @ W to mean U + W. So the 
notation means the sum of the subspaces, and the use of the special 
symbol signifies that the sum is direct. 

It turns out that there is another, often very useful, way of charac- 
terising when a sum of subspaces is direct, as the following theorem 
shows: 


Theorem 12.5 Suppose U and W are subspaces of a vector space. 
Then the sum of U and W is direct if and only if every vector z in the 
sum can be written uniquely (that is, in one way only) as z = u + W, 
where u € U and w € W. Explicitly, the sum is direct if and only if 
whenever u, w € U and w, w' € W and u + w = u + w', then u = u’ 
and W = W'. 


Proof. There are two things we need to prove to establish the theorem. 
First, we need to show that if the sum is direct, then any element in the 
sum has a unique expression as a sum of a vector in U and a vector 
in W. Secondly, we need to show that, conversely, if it’s the case that 
any vector in the sum can be expressed in such a way uniquely, then it 
follows that the sum is direct (that is, U N W = {0}). 
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Suppose first that the sum is direct, so that U N W = {0}. Now 
suppose that u, u’ € U and w, w € W and that u + w = u’ + w’. Then 
we may rearrange this as follows: 


u—u =w — wW. 


Now, the vector on the left-hand side is in U, because u and u’ are and 
because U is a subspace. Similarly, the vector on the right is in W. So the 
vector v = u — u’ = w — w is in both U and W. But U N W = {0}, so 
v = 0, which means u — w = 0 = W — w and so u = u and w = w’. 

Now suppose that every vector z in the sum can be written uniquely 
as z = u + w. We want to show that this means U N W = {0}. Now, 
suppose that z € U N W. Then we can write z as z = z + 0, where 
z € U and 0 € W and we can also write z = 0 + z, where 0 € U and 
z € W. But z can be expressed in only one way as a sum of a vector 
in U and a vector in W. So it must be the case that z = 0. Otherwise, 
these two expressions are different ways of expressing Z as a vector in 
U plus a vector in W. Here, we assumed z was any member of UN W 
and we showed that z = 0. It follows that U N W = {0} and the sum is 
direct. Oo 


This theorem shows that there are really two equivalent definitions of 
what it means for a sum of subspaces to be direct, and it is useful to be 
able to work with either one. A sum of subspaces U and W is direct if 
either of the following equivalent conditions holds: 


- UNW = {0}. 
e Any vector in U + W can be written uniquely in the form u + w 
with u € U andwe W. 


Example 12.6 Suppose that u, w € R” and that u and w are linearly 
independent. Then the sum Lin{u} + Lin{w} is direct. In other words, 


Lin{u, w} = Lin{u} 6 Lin{w}. 


(We saw in Activity 12.3 that Linfu, w} = Lin{u} + Lin{w}; what’s new 
here is that the sum is direct.) 

To show the sum is direct, we can verify that if U = Lin{u} 
and W = Lin{w}, then UM W = {0}. So suppose z € U N W. Then 
z € Lin{u} and z € Lin{w}, so there are scalars a, 6 such that z = au 
andz = Bw. Ifz + 0, then we must havea 4 Oand £ 4 0.Soau = Bw 
and therefore u = (8/a)w. But this can’t be, since u and w are linearly 
independent. So we can only have z = 0. This shows that the only vector 
in both U and W is the zero vector. And, clearly, 0 € U N W because, 
since U and W are subspaces, 0 belongs to both of them. 
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12.2 Orthogonal complements 


For a subspace S of a vector space V with an inner product, there is a 
particularly important direct sum involving S. This involves a subspace 
known as the orthogonal complement of S. 


12.2.1 The orthogonal complement of a subspace 


Suppose that V is a vector space with an inner product, denoted as 
(x, y) for x, y € V. Given a subset S of V, we define the following set, 
denoted by SŁ. 


Definition 12.7 (Orthogonal complement) The orthogonal comple- 
ment of a subset S of a vector space V with inner product (x, y) is 


St = {v e V | forall s € S, (v,s) = 0} 
= {ve V | forall se S,v Ls}. 


In other words, S+ is the set of vectors that are orthogonal to every 
vector in S. 


It turns out that S+ is a subspace (and not merely a subset) of V. (This 
is true for any set S: S itself need not be a subspace.) 


Theorem 12.8 For any subset S of V, S+ is a subspace of V. 


Activity 12.9 Prove Theorem 12.8. 


Example 12.10 Suppose that V = R? with the standard inner product 
and suppose that S = Lin{u}, where u = (1,2, —1)'. Then S+ is the 
set of all vectors v such that, for alls € S, (v, s) = 0. Now, any member 
of S is of the form au. We have (v, au) = a(v, u), so v = (x, y, z)! is 
in S+ precisely when, for all œ, æ (v, u) = 0, which means (v, u) = 0. 


So we see that 
x 
Z 


That is, S is the line through the origin in the direction of u and S+ 
is the plane through the origin perpendicular to this line; that is, with 
normal vector u. 


r+3-a=0). 


Example 12.11 Suppose again that V =R? with the standard 
inner product and this time suppose that S = Lin{u, w}, where 
u = (1,2, —1)" and w = (1,0, 1)". Then what is S+? Considering 
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Example 12.10, you might expect that since S is a plane through the 
origin, then S+ is the normal line to the plane through the origin, and 
indeed this is the case, but let us see precisely why. 

The typical element of S is of the form au + Bw. Suppose z € St. 
Then z is orthogonal to every member of S. In particular, since u € S 
and w € S,z L uandz L w. Conversely, if z L u and z L w, then for 


any a, 6 
(z, au + Bw) = a(z, u) + (z, w) =0+0=0, 


so z is orthogonal to every vector in S. So we see that S+ is exactly the 
set of vectors z orthogonal to both u and w. Now, for z = (x, y, z)', 
this means we must have both 


x+2y—z=0 and x+z=0. 
Solving this system, we see that 


s- {(2) (2) rex}, 


which is a line through the origin. So, here, S is a plane through the 
origin and S+ is the line through the origin that is perpendicular to the 
plane S. 


N 
M 
A 


Activity 12.12 Can you think of another way of finding S+? 


If V is a finite-dimensional inner product space (such as R”), and S is 
a subspace of V, then an important fact is that every element of V can 
be written uniquely as the sum of a vector in S and a vector in S+. In 
other words, we have the following theorem: 


Theorem 12.13 For any subspace S of a finite-dimensional inner prod- 
uct space V, V = S Q St. 


Proof: Suppose z € SN S+. Then z € S and z L s for all s € S. So 
zL z, which means (z,z)=0. So ||z||*=0 and hence z= 0. 
This shows that SN S+ C {0}. On the other hand, 0 € S N St, so 
{0} C SN SŁ. It follows that S N S+ = {0}. 

Next, we show V = S + S+. The cases in which S = {0} or S = V 
are easily dealt with. So suppose S Æ {0} and S Æ V. 

Let dim(V) = n and let {e;,e2,...,e,} be an orthonormal basis 
of the subspace S. Such a basis exists by the Gram-Schmidt 
orthonormalisation process. We can extend this to an orthornormal 
basis of V, 


{e1, -> ©, Cr+1;5 ay €n}, 
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again using the Gram-Schmidt process if necessary. We will show that 
SŁ = Lin{e,+1,...,€,}. If we can do this, then since any v € V can be 
written as 


v= So aye; = (aye; +--+ + ærer) + (Qe re +++ + Open), 
i=l 


the quantity in the first parentheses will be in S and the quantity in the 
second in S+, showing that V = S + S+. 
So suppose v € Lin{e,,;,...,e,}. Then, for some a@,41,..., Qn, 


V = Oy 1Or 41 +++ + nen. 
Any s € S can be written in the form 
aye; +: + Arer. 
So 
(v, S) = (Q416 41 He + Anen, ae; Hee HH ae). 


When you expand this inner product, all the terms are of the form 
œjæ;(e;, ej} with i Æ j. But these are all 0, by orthonormality. So 
v L sforalls € Sand hence v € S+. Conversely, suppose that v € S+. 
Because {e1,..., €n} is a basis of V, there are œ1,..., œn with v = 
X; i œe. Now, ej, €2,..., € € S and v € SŁ, so (v,e;) = 0 fori = 
1,...,r. But (v, e;) =a;.Soa, =a) =--- =a, = 0 and hence v is a 
linear combination of e41, .. . , €n only; that is, v € Lin{e,+1, ..-, en}. 

Because SM S+ = {0}, the sum S + S+ is direct and therefore V = 
Sest. 


Another useful result is the following: 


Theorem 12.14 /fS is a subspace of a finite-dimensional inner product 
space V, then(S+)+ = S. 


You will be able to prove this result yourself when you have worked 
through the exercises at the end of this chapter. 


12.2.2 Orthogonal complements of null spaces and ranges 


There are four important subspaces associated with a matrix: the null 
space and the range of A, and the null space and range of AT. If A is 
anm x n real matrix, then N(A) = {x € R” | Ax = 0} is a subspace of 
R”, R(A) = {Ax | x € R”} is a subspace of R”, N(A') is a subspace 
of R” and R(A‘) is a subspace of R”. In fact, R(A') is just the row 
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space of A, which we met earlier in Section 5.3.1, since the rows of A 
are the columns of AT. 

It’s quite natural, therefore, for us to ask what the orthogonal com- 
plements of these subspaces are. The following result answers these 
questions: the orthogonal complements of the null space and range 
of any matrix are the range and null space of the transpose matrix, 
respectively. 


Theorem 12.15 Suppose that Aisanm x nrealmatrix. Then R(A)* = 
N(A!) and N(A)*+ = R(A!). 


Proof: We prove that R(A)+ = N(A‘). The other result then follows 
by substituting A‘ for A, obtaining R(A')+ = N(A) and then, taking 
orthogonal complements, 


R(A") = (RAD = NAD. 


The easiest way to show that R(A)+ = N(A”) is to show that R(A)*+ € 
N(A!) and that N(A') C R(A)+. The key fact we’re going to use in 
this proof is that for any matrix A and any vectors x € R”, y € R”, 


(y, Ax) = (A’y, x), 


even though y, Ax € R” and A'y,x € R” . This is because the inner 
product is a scalar given by (a, b} = a'b, and so 


(ATy, x) = (A’y)'x = y'(4")'x = y' Ax = y (Ax) = (y, Ax). 


Suppose z € R(A)+. This means that forall y € R(A), (z, y) = 0. Every 
y € R(A)is of the form Ax, by definition of R(A), soifz € R(A)*, then 
for all x, (z, Ax) = 0, which means (A'z, x) = 0 for all x. If we take 
x = A'z, we see that || A'z||* = (A'z, A'z) = 0, so we have A'z = 0. 
This shows that z € N(A'). Hence R(A)* € N(A!). 

Now suppose z € N(A'). Then A'z=0. So, for all x € R”, 
(A'z, x) = Oand hence (z, Ax) = 0, for all x. But this means (z, y) = 0 
for all y € R(A), and so z € R(A)". This shows that N(A') € R(A)+. 


Example 12.16 Suppose again that V = R? with the standard inner 
product and suppose that S = Lin{u, w}, where u = (1,2, —1)" and 
w = (1,0, 1)'. Then what is S+? Earlier, in Example 12.11, we found 


that 
re | 
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by obtaining the solution x of the homogeneous system of equations 
given by (u, x) = 0 and (w, x) = 0. 
We now have another way to confirm this, by using Theorem 12.15. 


For S = R(A) where 
1 1 
fe 2 o) | 
-1 1 
By Theorem 12.15, St = (R(A))+ = N(A‘). Now, 
1 2 -l 
T= 
aS 6 0 1 ) l 
If we determine N(A'), we find S+. Since AT is precisely the coefficient 


matrix of the homogeneous system of equations we had before, we get 
exactly the same answer. 


The result that R(A') = N(A)+ for an m x n matrix A is just another 
way of looking at familiar results concerning a linear system of homo- 
geneous equations in light of what we now know about orthogonal 
complements. The subspace R(A') is the linear span of the columns 
of AT, so it is the linear span of the rows of A. We denoted this sub- 
space by RS(A) in Section 5.3.1. If v is any solution of Ax = 0, then 
since Av = 0 we must have (r;, v) = 0 for each row r;, i =1,...,m 
of A. Therefore, any and every vector v in N(A) is orthogonal to any 
and every vector in RS(A) = Lin{r,,...,1r,}, so these subspaces are 
orthogonal. In particular, the only vector which is in both subspaces is 
the zero vector, 0, since such a vector will be orthogonal to itself. But 
now we have additional information. We know by Theorem 12.15 that 
the row space and the null space are orthogonal complements, so R” is 
the direct sum of these two subspaces. 

As a direct consequence of this observation, we can prove the fol- 
lowing useful fact. 


e If Aisanm xn matrix of rank n, then ATA is invertible. 


You have already proved this in Exercise 6.13 and again in Exercise | 1.6, 
but let’s look at it once again using Theorem 12.15. Since A ism x n 
and AT isn x m, the matrix ATA isasquaren x n matrix. We will show 
that the only solution of the system A! Ax = 0 is the trivial solution, so 
by Theorem 4.5 this will prove that ATA is invertible. 

Let v € R” be any solution of A' Ax = 0, so A' Av = 0. Then the 
vector Av € R” is in the null space of AT since A'(Av) = 0 and, also 
Av is in the range of A. But N(A') = R(A)*, so N(A') A R(A) = {0}. 
Therefore we must have Av = 0. But A has full column rank, so the 
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columns of A are linearly independent and the only solution of this 
system is the trivial solution v = 0. This completes the argument. 


12.3 Projections 
12.3.1 The definition of a projection 


Suppose that a vector space V can be written as a direct sum of two 
subspaces, U and W, so V = U @ W. This means that for each v € V 
there is a unique u € U and a unique w € W such that v = u + w. We 
can use this fact to define two functions Py : V —> U and Py : V > W 
as follows. 


Definition 12.17 Suppose that the vector space V is such that V = 
U ® W where U and W are subspaces of V. Define the functions Py : 
V > U and Py: V — W as follows: for each ve V, ifv=u+w 
where u € U and w € W (these being unique, since the sum is direct), 
then let Py(v) =u and Py(v) = w. The mapping Py is called the 
projection of V onto U, parallel to W. The mapping Py is called the 
projection of V onto W, parallel to U. 


Activity 12.18 Why does the sum V = U © W have to be direct for us 
to be able to define Py and Py? 


Each of the projections Py and Py is a linear transformation. For, 
suppose that vı, V2 € V and a, a are scalars, and suppose Py(v1) = 
uy, Pw(vi) = wi, Py(V2) = Ww, and Py(v2) = w2. This means that 
vı = u; + w; and that v) = uy + W> (and also that there are no other 
ways of writing vı and v2 in the form u + w, where u € U andwe W). 
Now, 


V1 + 2V2 = (Uy + W1) + œ2(U2 + W2) 
= (aU + 22) + (&ıW1 + &2W2). 
inU inW 
So, Vv; taov2=u+w, where w = g&u; +œ EU and 
W = Wi + a2W2 E€ W. Therefore, we must have 


Py (av) + 2v2) = W = œu; + a2Uy = a Py(v1) + &2Py(v2) 
and 


Py(a1V + &2V2) = W = &1W1 + Q2W2 = a) Pw (y1) + @2 Py (v2). 
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12.3.2 An example 


Example 12.19 Suppose that V =R? with the standard inner 
product and suppose that U = Lin{(1, 2, —1)!, (1, 0, 1)"}. We saw in 
Examples 12.11 and 12.16 that 

re | ; 


r 
v=- Lf) 
—r 


Theorem 12.13 tells us that if we let W = U+, then R7=UOW. 
What is the projection Py of R? onto U, parallel to W? Well, we’ll find 
a general way of answering such questions later, but let’s see if we can 
do it directly in this particular case. The fact that the sum of U and 
W is direct means that for each x = (x, y, z)" € R? there are unique 
members u of U and wof W so that x = u + w. In this case, this means 
there are unique «œ, 6, y such that 


(7) =«(2,}+#(2)+r(2). 


in U in W 


This means that a, 6, y satisfy the linear system 


a+Bty=x 
2a-y=y 
—a+B-y =z, 
which has solution 
ae oe a yA a ie 
a= "3 6 Pate Y 37373 


So the projection Py is given by 


1 1 get gy t 32 
ru=a( 2) +6(0)- yet Gy — 42 
=l i ix — ay + 3z 


There must be an easier way! Well, as we’ll see soon, there is. 


Activity 12.20 Check this solution. 
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12.3.3 Orthogonal projections 


Example 12.19 concerns a special type of projection. It is the projection 
onto S parallel to S+ (for a particular subspace S). Not all projections 
are of this form (because, generally, there are many ways to write 
V = U @ W where W is not UŁ), but this type of projection is called 
an orthogonal projection. 


Definition 12.21 Fora subspace S of any finite-dimensional inner prod- 
uct space V (such as R”), the orthogonal projection of V onto S is the 
projection onto S parallel to S+. 


12.4 Characterising projections and orthogonal 
projections 


12.4.1 Projections are idempotents 


Projections have some important properties. We’ve already seen that 
they are linear. Another important property is that any projection P 
(onto some subspace U, parallel to another, W, such that V = U @ W) 
satisfies P? = P. Such a linear transformation is said to be idempotent 
(or we say it is an idempotent). 


Definition 12.22 The linear transformation T is said to be idempotent 
if T? =T. 


This term also applies to the matrix representing an idempotent linear 
transformation when V = R” or any finite-dimensional vector space. 


Definition 12.23 The matrix A is said to be idempotent if A? = A. 


Activity 12.24 As an exercise, show that the only eigenvalues of an 
idempotent matrix A are 0 and 1. 


Theorem 12.25 Any projection is idempotent. 


Proof: This is quite easy to see. Let us take any v € V and write v as 
v = u + w where u € U and w € W. Then Py(v) = u. What we need 
to show is that P?(v) = P(v); in other words, P(P(v)) = P(v), which 
means P(u) = u. But, of course, P(u) = u because the way u is written 
as a vector in U plus a vector in W is u = u + 0. 


The fact that P? = P means that, for any n, P” = P. For example, 
P? = P*P = PP = P? = P. This is where the name ‘idempotent’ 
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comes from: ‘idem-potent’ means all powers are equal (‘idem’ signifies 
equal, and ‘potent’ power). 

In fact, if we have any linear transformation P that satisfies P? = P, 
then it turns out to be a projection. In other words, a linear transformation 
is a projection if and only if it is an idempotent. 

Theorem 12.26 A linear transformation is a projection if and only if it 


is an idempotent. 


Proof. We’ve already seen that any projection is idempotent. Suppose 
now that we have an idempotent linear transformation, P. So, P? = P. 
Let us define U to be R(P) = {P(x) | x € V} and let 


W = N(P) = {v | PV) = 9}. 


We’ll show two things: (i) that V = U @ W and that (ii) P is the 
projection onto U parallel to W. 
For (i), we observe that for any x € V, x = P(x) + (x — P(x)). 
Now, P(x) € R(P) and, because 
P(x — P(x)) = P(x) — P?(x) = P(x) — P(x) = 0, 
we see that x — P(x) € N(P). So any x in V can be expressed as the 
sum of a vector in R(P) and a vector in N(P) and therefore 
V = R(P)+ N(P). 
We need to show that the sum is direct. So suppose that z € R(P) A 
N(P). Then, for some y, we have z = P(y) and, furthermore, P(z) = 0. 
But this implies that P(P(y)) = 0. This means P?(y) = 0. But P? = P, 
so P(y) = 0. Thus, z = P(y) = 0. On the other hand, since, certainly, 
0 € R(P)N N(P), we have R(P)M N(P) = {0} and the sum is direct. 
We now need to establish (ii). Suppose that v € V and that v = 
u + w, whereu € U = R(P)andwe W = N(P). Because u € R(P), 
there is some x such that u = P(x) and, therefore, since P? = P, 
P(u) = P(P(x)) = P?(x) = P(x) =u. 
Therefore, 
P(v) = Ptu+w) = P(u) + P(w) =u+0=Uu. 
This completes the proof. 


Note that, for a projection P onto U parallel to W, Px = 0 if and only 
if x takes the form x = 0+ w for some w € W. So, N(P) = W. We 
summarise this in the following statement. 


e If Pisaprojection from V onto a subspace U parallel to a subspace 
W, then U = R(P) and W = N(P). 


There is a similar characterisation of orthogonal projections. 
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Theorem 12.27 If V is a finite-dimensional inner product space and 
P is a linear transformation, P : V — V, then P is an orthogonal 
projection if and only if the matrix representing P is idempotent and 
symmetric. 


Proof: For simplicity, let P denote both the linear transformation and 
the matrix representing it, and suppose that P is not just idempotent, but 
also symmetric (P = P'). Then we know, because it’s an idempotent, 
that P is the projection onto R(P) parallel to N(P). Now, because 
P = P", N(P) = N(P') = (R(P))}. So P projects onto R(P) parallel 
to (R(P))* and is therefore an orthogonal projection. 

Conversely, it’s true that any orthogonal projection will be both 
idempotent and symmetric. We already know that it must be idempotent, 
so we now have to show it is symmetric. Well, suppose that P is the 
orthogonal projection onto U (parallel to U+). Any x € V canbe written 
uniquely as x = u + u’, where u € U and u’ € UŁ, and the projection 
P is such that Px = u € U. Note, too, that 


(I — P)x = x — Px = (u + u) -u = uw € UH. 


So, it follows that, for any v, w € V, Pv € U and (J — P)w € U+ and 
hence (Pv, (J — P)w) = 0. That is, (Pv)'(J — P)w = 0, which means 
vT PT(I — P)w = 0. Now, the fact that this is true for all v, w means 
that the matrix P'(J — P) must be the zero matrix. For, if e; denotes, 
as usual, the ith standard basis vector of R”, then e/ P'(I — P)e; is 
simply the (i, 7)th entry of the matrix P'(J — P). So all entries of that 
matrix are 0. The fact that P'(J — P) = 0 means that PT = PTP and 
we therefore have 


P=(P!)'=(P'P)'=P)(P')'=P'P=P!. 


In other words, P is symmetric. Admittedly, this is a rather sneaky 
proof, but it works! Notice, by the way, that it also follows immediately 
(though we already know this) that P is idempotent, for 


P?=PP=P'P=P. 


12.5 Orthogonal projection onto the range 
of a matrix 


Let’s start with a simple observation. Suppose that A is anm x n real 
matrix of rank n. Then the matrix ATA is ann x n matrix, and (as we 
have seen in Section 12.2.2) ATA is invertible. 
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Therefore, we can compute the matrix 
P = A(ATA) AT, 
It turns out that this matrix P has a very useful property. 


Theorem 12.28 Suppose Ais anm x n real matrix of rank n. Then the 
matrix P = A(AT AJ! A" represents the orthogonal projection of R” 
onto the range R(A) of A. 


Proof: We show three things: (i) P is idempotent, (ii) P is symmetric, 
(iii) R(P) = R(A). Then, (i) and (ii) establish that P is the orthogo- 
nal projection onto R(P) and, with (iii), it is therefore the orthogonal 
projection onto R(A). 


First, 

P? = (A(A'A)1A')(A(A™A) 14?) 
= A(A'A) (ATA) ATA)! AT 
= A(A™A)! AT 
=P. 

Next, 


pT = (A(At A) AD" 
= (AT)! CE a 
=A (a A" 
= A(ATA)! AT 
=P. 
Now, clearly, since 
Px = A(ATAJ! Ax = A (474)! 47x) 


any vector of the form Px is also of the form Ay for some y. That is, 
R(P) C R(A). What we need to do, therefore, to show that R(P) = 
R(A) is to prove that R(A) C R(P). So, suppose z € R(A), soz = Ax 
for some x. Now, 


Pz = PAx = |A(A™A)'A"| Ax = A(ATAY (7A) = AK =z, 


so z= Pz € R(P). This shows that R(A) C R(P), and we are 
done. 


Example 12.29 In Example 12.19, we determined the orthogonal pro- 
jection of R? onto U = Lin{(1, 2, —1)', (1, 0, 1)"}. We found that this 
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is given by 


P(x) = iy + žy iz 


The calculation we performed there was quite laborious, but Theo- 
rem 12.28 makes life easier. What we want is the orthogonal projection 
of R? onto R(A), where A is the matrix (of rank 2) 


1 1 
a=(2 o). 
—1 1 


By Theorem 12.28, this projection is represented by the matrix 
P = A(ATAJ! AT. Now, 
6 0 
Ty 
AE a 3 ? 


so 


P= A(A'A)'A™ 


=(2 o\t(! ea 
Ne pee 371 @ J 


II 

| 
ATN 
Deg 3a 
= 
= OD = 
SS 
a 
We 
O N 
w | 

— 
NY 


Wile 


rye t ie 
px=3(1 j =) (2), 
PAT ct 27 Vs 


which is exactly the same as we determined earlier. 
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12.6 Minimising the distance to a subspace 


Suppose that U is a subspace of R” and suppose that v € R”. What is 
the smallest distance between v and any member of U? Well, obviously, 
if v € U, then this smallest distance is 0, since v has distance 0 from 
itself. But, generally, for v ¢ U, the problem is to find u € U such that 
the distance ||v — ull is as small as possible, over all choices of u from 
U. You are probably familiar with the fact that if you have a line in 
two dimensions and a point p not on the line, then the point on the line 
closest to p is obtained by ‘taking a perpendicular’ to the line through 
p. Where that perpendicular hits the line is the point on the line closest 
to p. Well, essentially, this is true in general. If we want to find the point 
of U closest to v, it will be Pv where P is the orthogonal projection of 
R” onto U. Let’s prove this. 


Theorem 12.30 Suppose U is a subspace of R”, thatv € R”, and that 
P is the orthogonal projection of R” onto U. Then for allu € U, 


Iv — ull > |v — Pyll. 
That is, Pv is the closest point in U to v. 
Proof: For any u € U, we have 
lu — vl] = |i — Pv) + (Pv = vll. 
Now, 
P(Pv —v) = P’v— Pv = Pv — Pv = 0, 
so Pv — v € N(P) = UŁ. Also, u — Pv € U because Pv € U and U 
is a subspace. So the vectors u — Pv and Pv — v are orthogonal. By 
the generalised Pythagoras theorem, 
lu — vil? = Ilu = Pyll? + I| Pv — vI. 
Since ||u — Pv||? > 0, this implies 


2 2 
lu — vit > ||Pv—vIl’, 


as required. 
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12.7 Fitting functions to data: least squares 
approximation 


12.7.1 The idea 


Suppose we want to find an equation that models the relationship 
between two quantities X and Y, of the form Y = f(X). In the simplest 
case, we might try to model the relationship by assuming that Y is 
related to X linearly, so that, for some constants a and b, Y = a + bX. 
Now, suppose we have some data which provides pairs of values of X 
and Y. So we have, say, m pairs 


(X1, Yi), (X2, Y2), ..-, (Xm, Ym). 


For, what we want is to find a, b so that, for each i, Y; = a + bX;. 
But this might not be possible. It could be that there is some ‘noise’ 
or measurement errors in some of the X; and Y; values. Or it could be 
that the true relationship between them is more complex. In any case, 
suppose we still want to find a linear relationship that is approximately 
correct. That is, we want to find a, b so that Y = a + bX fits the data 
as well as possible. 

Usually, the appropriate way to measure how good a fit a given 
model Y = a + bX will give can be obtained by measuring the error, 
5; — (a + bX;))’. If this is small, then the fit is good. And what 
we want to do is find a and b for which this measure of error is as small 
as it can be. Such values of a and b are called a least squares solution. 
(They give the least value of the error, which depends on the squares 
of how far Y; is from a + bX;.) 

There are a number of approaches to finding the least squares solu- 
tion. In statistics, you might come across formulas that you learn. We 
can often also find the least squares solution using calculus. But what 
we want to do here is use the linear algebra we’ve developed to show 
how to find a least squares solution. (The method we present can also 
be adapted to handle more complicated ‘fitting’ problems.) 


12.7.2 A linear algebra view 


The equations Y; = a+ bX; for i = 1 to m can be written in matrix 
form as 
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This can be written Az = b. As noted, this might not have a solution. 


a 
What we want to do instead is find z = 5) 5 that the least squares 


measure of error is as small as possible. Now, the least squares error is 
m 
YG — (a + bX)’, 
i=l 
which is the same as 


lb — Az||?. 


So what we need is for Az to be the closest point of the form Ay to b. 
That is, Az has to be the closest point in R(A) to b. But we know from 
Theorem 12.30 that this closest point in R(A) is Pb, where P is the 
orthogonal projection onto R(A). Assuming that A has rank 2, we also 
know, from Theorem 12.28, that P = A(ATA)!AT. So what we want 
is 


Az = Pb = A(A'A)'A'b. 
One solution to this (and there may be others) is 
z = (A"A) | A'b. 


This is therefore a least squares solution. 


12.7.3 Examples 


Example 12.31 Suppose we want to find the best fit (in the least squares 
sense) relationship of the form Y = a + bX to the following data: 


X03116 
Y |145 


In matrix form, what we want is a least squares solution to the system 


a=1 
a+3b=4 
a+6b=5. 


(You can easily see no exact solution exists.) This system is Az = b, 
where 
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So a least squares solution is 
z 4/3 
raTa aTh 
zZ=(A A) ere 


(We’ve omitted the calculations, but you can check this.) So a best-fit 
linear relationship is 
4 2 


Y=—+-X. 
ae 


Activity 12.32 Check the calculation in this example. 


In principle, the least squares method can be used to fit many types of 
model to data. Here’s another example. 


Example 12.33 Quantities X, Y are related by a rule of the form 
Y=—+ 
= — +c 
X 


for some constants m and c. Use the following data to estimate m and 
c by the least squares method: 


X || 175 | 174| 1⁄3 | 1⁄2 |1 
yl 4l3 l2 2 li) 


This is not a linear relationship between X and Y, but when we use the 
given values of X and Y we do still get a linear system for m and c. For, 
what we need is the least squares solution to the system 


m i 4 
—— c= 
1/5 
m 
2a >23 
147° 
m 
ep =g 
Tu 
m n r 
— — 
1/2 
m 1 
—+c=1. 
1 

In matrix form, this is 4z = b, where 


a 

II 
eNO WwW B 
= = =e me e 

N 

l 

AEN 

o 3 

NNW 

5 

II 
PNN WwW B 
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So a least squares solution is 
z 7/10 
— ATi aTh 
zZ=(A A) Hoel a): 
Therefore, the best fit is 


yo 03 
a 3. 


More complex relationships can also be examined. Suppose, for 
instance, that we believe that X and Y are related by 


Y=a+bX+cX’, 


for some constants a, b, c. Suppose we have m data pairs (X;, Y;). Then 
what we want are values of a, b andc which are the best fit to the system 


a+bXı +cX =Y; 


a +bXn + cX? = Yn. 


In matrix form, this is Az = b, where 


1X% X% ý 
I X% X a 

Atl 3. | . le z=ļ|bļ, b= | : 
a 3 5 Y, 
ti Kn X 


Then, assuming 4 has rank 3, the theory above tells us that a least 
squares solution will be z = (ATA)! A'b. 


12.8 Learning outcomes 


You should now be able to: 


e explain what is meant by the sum of two subspaces of a vector space 

e explain what it means to say that a sum of two subspaces is a direct 
sum 

e demonstrate that you know how to prove that a sum is direct 

e show that a sum of subspaces is direct 

e state the definition of the orthogonal complement of a subspace and 
be able to prove properties of the orthogonal complement 

e determine the orthogonal complement of a subspace 

e demonstrate that you know that, for a matrix A, R(A)+ = N(A') 
and N(A)t = R(A‘); be able to use these results 
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e state precisely what is meant by a projection and an orthogonal 
projection 

e show that projections are linear transformations; show that a matrix 
represents a projection if and only if it is idempotent; and that 
a matrix represents an orthogonal projection if and only if it is 
symmetric and idempotent 

e show that the matrix of an orthogonal projection onto R(A), for a 
given m x n matrix A of rank n, is P = A(A'A)~'A!, and be able 
to use this to determine such projections 

e demonstrate an understanding of the rationale behind least squares 
approximation 

e explain why a least squares solution to Az = b when A is an m x n 
matrix of rank n is z = (ATA)! ATb; and use this in numerical 
examples to determine a least squares solution. 


12.9 Comments on activities 


Activity 12.2 Since 0 € U and 0 € W, we have 0 = 0+0E€EU+W 
and hence U + W +4. Suppose that v,v' € U+ W, so for some 
u, u’ € Uandw,w € W,v=u+wandv =u’ + w'. For scalars œ, $, 
we have 


av + Bv = a(u + w) + B(w + w) = (œu + Bu) + (aw + Bw), 
——— oT 
cU ceW 
which is in U + W. 


Activity 12.3 Since Lin{u} is the set of all vectors of the form au and 
since Lin{w} is the set of all vectors of the form fw, it follows that 


Linfu} + Lin{w} = {x + y | x € Lin{u}, y € Lin{w}} 
= {xu + Bw|a, bp € R} 
= Lin{u, w}. 


Activity 12.9 We have St Æ Ø because (0, s) = 0 for all s € S and 
hence 0 € S+. Suppose u, v € S+. So, forall s € S, (u, s} = (v, s) = 0. 
Then, for scalars a, 6, for all s € S, 


(au + Bv, s) = a (u, s) + B(v,s) = a0+ 60 = 0, 


so u + pve S+. 
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Activity 12.12 You could find the equation of the plane using 


1 1 x 
2 0 yl =2x —2y —2z=0. 
-1 1 z 


So the plane has Cartesian equation, x — y — z = 0 with a normal 
vector n = (1, —1, —1)! asa basis of S+. 


Activity 12.18 If the sum is not direct, then it won’t be the case that 
every vector can be written uniquely in the form u + w. If 


v=ut+w=u4+w 
are two different such expressions for v, then it is not possible to define 


Pu(v) without ambiguity: is it u or u’? The definition does not make 
sense in this case. 


Activity 12.24 If à is an eigenvalue of A, then Av = Av, where v Æ 0 
is a corresponding eigenvector of A. If A is idempotent, then also 
Av = A’v = A(AV) = A(AV) = A(AV) = Av, so we have A*v = Av or 
(à? — Av = 0. Since v Æ 0, we conclude that à? — A = A(A — 1) = 0 
with A = 0 or A = 1 as the only solutions. 


12.10 Exercises 


Exercise 12.1 Suppose S is a subspace of R”. Prove that 


dim(S) + dim(S*) = n. 


Exercise 12.2 Suppose S is a subspace of R”. Prove that S C ($+). 
Prove also that dim(S) = dim((S+)+). (You may assume the result of 
the previous exercise.) Hence, deduce that (S+)+ = S. 


Exercise 12.3 What is the orthogonal projection of R4 onto the sub- 
space spanned by the vectors (1, 0, 1, 0)! and (1, 2, 1, 2)'? 


Exercise 12.4 Let A be an n x n idempotent matrix which is diago- 
nalisable. Show that R(A), the range of A, is equal to the eigenspace 
corresponding to the eigenvalue à = 1. 


Exercise 12.5 Consider the matrix 


5 -§ —4 
-1 2 2 


which you diagonalised in Exercise 9.10. Show that A is idempotent. 
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Let T denote the linear transformation T : R? — R? given by 
T(x) = Ax. Deduce that T is a projection. Show that 7 is the 
projection from R? onto the eigenspace corresponding to the eigen- 
value à = 1, parallel to the eigenspace corresponding to A = 0. Is this 
an orthogonal projection? 


Exercise 12.6 Find a least squares fit by a function of the form 
Y = a + bX to the following data: 


X Í—1/0]12 
Y| O |1/3]9 


Exercise 12.7 Suppose we want to model the relationship between X 
and Y by 


Y =a +bX + cX? 


for some constants a, b, c. Find a least squares solution for a, b, c given 
the following data: 


X110|/1|2f3 
Y/3|2)4/4 


12.11 Problems 


Problem 12.1 Suppose that 


1 0 0 1 

: 0 0 : 1 0 

X = Lin illo and Y = Lin olai 
0 1 0 —1 


Is the sum X + Y direct? If so, why, and if not, why not? Find a basis 
for X +Y. 


Problem 12.2 Let 


Y = Lin 


= 
NO Be 


1 
Find a basis of Y+. 


Problem 12.3 Let U and V be subspaces of an inner product space X, 
and let U+ and V+ be their orthogonal complements. Prove that 


(U+V)t=utnevt. 
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Problem 12.4 Suppose that u, w € R? are the vectors 


TORRE] 


Using the definition of a direct sum, show that R? = Lin{u} @ Lin{w}. 


(a) Find the projection P of R? onto U = Lin{u} parallel to W = 
Lin{w}. Find the image P(e,) of the vector e; = (1, 0). 

(b) Find the orthogonal projection 7 from R? onto Lin{u}. Then find 
the image of e; = (1, 0) under this linear transformation. 


Problem 12.5 Suppose p > 1 and that then x n real matrix A satisfies 
APt! = AP, Prove that A’ = A? for all j > p and that 


R” = R(A?) ® N(A?). 


Problem 12.6 Let x € R” and let S be the subspace Lin{x} spanned by 
x. Show that the orthogonal projection matrix P of R” onto S is 


1 
= — x". 
Ixl 


Problem 12.7 If z = (2, —3, 2, —1)', find the matrix of the orthogonal 
projection of R* onto Lin{z}. 


Problem 12.8 Let X be the subspace of R? spanned by the vectors 
(0, 1,1)! and (2, 1,—1)'. Find a basis of X+. 

Find the matrix P representing the orthogonal projection of R? onto 
X.Is P diagonalisable? 

Find an eigenvector of P corresponding to the eigenvalue 0 and an 
eigenvector corresponding to the eigenvalue 1. 


Problem 12.9 Show that the matrix 
1 2 —2 0 

1 2 7 -1 3 

Page ay a. A 

0 3 3 3 


is idempotent. Why can you conclude that P represents an orthogonal 
projection from R‘ to a subspace Y of R4? 

State what is meant by an orthogonal projection. Find subspaces Y 
and Y+ such that P is the orthogonal projection of R4 onto Y. (Write 
down a basis for each subspace.) 


Problem 12.10 Suppose A is ann x n diagonalisable matrix and that 
the only eigenvalues of A are 0 and 1. Show that A is idempotent. 
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Deduce that the linear transformation defined by T(x) = Ax is a pro- 
jection from R” onto the eigenspace corresponding to the eigenvalue 
à = | parallel to the eigenspace corresponding to A = 0. 


Problem 12.11 Let U be the plane in R? given by 


f(a) 


Find the nearest point (position vector) in U to the point whose position 
vector is p = (1, 1, 2)'. 


Problem 12.12 Quantities x, y are known to be related by a rule of the 
form y = ax + b for some constants a and b. Readings are taken of y 
at various values of x, resulting in the following measurements: 


x}/2 |4 |5 | 6 
deals al Or 2S a U2 


Find the least squares estimate of a and b. 


Problem 12.13 Suppose we want to find the least-squares line y = 
m*x + c* through the data points (x1, y1), (X2, Y2), <- +5 (Xn, Yn). Show 
that the parameters m* and c* of the least-squares line are as follows: 


me = n Yi XiYi — ee aa 
= 5 
n> xX = (Xi x) 
ies Viet Vi Lint x? — Dja Xi pa Xii 
— ; 5 : 
Wy ae (Xi x) 


(These formulae might be familiar from statistics courses.) 


13 


Complex matrices and 
vector spaces 


A complex matrix is a matrix whose entries are complex numbers. A 
complex vector space is one for which the scalars are complex numbers. 
We shall see that many of the results we have established for real 
matrices and real vector spaces carry over immediately to complex 
ones, but there are also some significant differences. 

In this chapter, we explore these similarities and differences. We 
look at eigenvalues and eigenvectors of a complex matrix and investi- 
gate unitary diagonalisation, the complex analogue of orthogonal diag- 
onalisation. Certain results for real matrices and vector spaces (such as 
the result that the eigenvalues of a symmetric matrix are real) are easily 
seen as special cases of their complex counterparts. 

We begin with a careful review of complex numbers. 


13.1 Complex numbers 


Consider the two quadratic polynomials, p(x) =x? — 3x +2 and 
g(x) =x? +x +1. If you sketch the graph of p(x), you will find 
that the graph intersects the x axis at the two real solutions (or 
roots) of the equation p(x) = 0, and that the polynomial factorises 
into two linear factors: p(x) = x? — 3x + 2 = (x — 1)(x — 2). Sketch- 
ing the graph of q(x), you will find that it does not intersect the x 
axis. The equation g(x) = 0 has no solution in the real numbers, and 
it cannot be factorised over the reals. Such a polynomial is said to be 
irreducible. In order to solve this equation, we need to use complex 
numbers. 
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13.1.1 Complex numbers 


We begin by defining an imaginary number, which we denote by the 
letter i and which has the property that i? = —1. The term ‘imaginary’ 
is historical, and not an indication that this is a figment of someone’s 
imagination. 


Definition 13.1 A complex number is a number of the form z = a + ib, 
where a and b are real numbers, and i? = —1. The set of all such 
numbers is 


C = fa +ib |a,be R}. 


Ifz = a + ib is a complex number, then the real number a is known as 
the real part of z, denoted Re(z), and the real number b is the imaginary 
part of z, denoted Im(z). Note that Im(z) is a real number. 

If b = 0, then z = a + ib is just the real number a, so R C C. If 
a = 0, then z = ib is said to be purely imaginary. 

The quadratic polynomial q (z) = x? + x + 1 can be factorised over 
the complex numbers, because the equation q (z) = 0 has two complex 
solutions. Solving in the usual way, we have 


alty 
x= 5 ; 
We write /—3 = /(—1)3 = V-I V3 = iV3, so that the solutions are 
1 3 1 3 


Notice the form of these two solutions. They are what is called a con- 
jugate pair. We have the following definition: 


Definition 13.2 (Complex conjugate) Ifz = a + ib is a complex num- 
ber, then the complex conjugate of z is the complex number Z = a — ib. 


We can see by the application of the quadratic formula that the roots of 
an irreducible quadratic polynomial with real coefficients will always 
be a conjugate pair of complex numbers. 


13.1.2 Algebra of complex numbers 


Addition and multiplication of complex numbers are defined by treating 
the numbers as polynomials in i, and using i* = —1. 
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Example 13.3 If z = (1 + i) and w = (4 — 2i), then 
zt+w=(14+1/+4-2/)=0d4+4)4+i0 -2)=5-i 
and 
zw =(1+i)(4—2i) = 4 + 4i — 2i — 2i? = 6 + 2i. 
Ifz € C, then zZ is a real number: 
zZ = (a + ib)\(a — ib) =a? + b’. 
Activity 13.4 Carry out the multiplication to verify that zz = a? + b?. 


Division of complex numbers is then defined by 


noting that ww is real. 


Example 13.5 
1+i  (1+iX(4+2i) 2+6i 1 3., 


J-a U-ad4a CA 10 0. 
We now look at some properties of the complex conjugate. A complex 
number is real if and only if z = Z. Indeed, if z = a + ib, then z = Z if 
and only if b = 0. 
The complex conjugate of a complex number satisfies the following 
properties: 


e z +Z = 2 Re(z)is real, 
e z —Z = 2iIm(z) is purely imaginary, 
a Z= 
e Z+w=Z+ UN, 
e ZW = 

Z Z 
ee 
Activity 13.6 Letz =a +ib, w = c +id and verify all of the above 
properties. 


N 


13.1.3 Roots of polynomials 


The Fundamental Theorem of Algebra asserts that a polynomial of 
degree n with complex coefficients has n complex roots (not necessar- 
ily distinct), and can therefore be factorised into n linear factors. If the 
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coefficients are restricted to real numbers, the polynomial can be fac- 
torised into a product of linear and irreducible quadratic factors over R 
and into a product of linear factors over C. The proof of the Funda- 
mental Theorem of Algebra is beyond the scope of this text. However, 
we note the following useful result: 


Theorem 13.7 Complex roots of polynomials with real coefficients 
appear in conjugate pairs. 


Proof: Let P(x) = a9 + a,x +-+--+a,x", a; € R, be a polynomial 
of degree n. We shall show that if z is a root of P(x), then so is Z. 
Let z be a complex number such that P(z) = 0, then 


ay + az +4927, +++» + anz” =0. 


Conjugating both sides of this equation, 


ag + aiz +a2z2 + - - - + az” =0=0. 


Since 0 is a real number, it is equal to its complex conjugate. We 
now use the following properties of the complex conjugate: that 
the complex conjugate of the sum is the sum of the conjugates, and 
the complex conjugate of a product is the product of the conjugates. We 
have 


o + AIZ + a22? +++» + anz" = 0, 
and 

lo +Z +Z tetas = 0. 
Since the coefficients a; are real numbers, this becomes 


ay + ayZ + aZ +++» + aZ” = 0. 


That is, P(Z) = 0, so the number Z is also a root of P(x). 
Example 13.8 Let us consider the polynomial 

x? — 2x? — 2x — 3 = (x — 3x? +x + 1). 
If 


then 


x? — 2x? — 2x — 3 = (x — 3)(x — w)(x — T). 


Activity 13.9 Multiply out the last two factors above to check that 
their product is the irreducible quadratic x? + x + 1. 


‘igure 13.1 
omplex plane or 
rgand diagram 
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13.1.4 The complex plane 


The following theorem shows that a complex number is uniquely deter- 
mined by its real and imaginary parts. 


Theorem 13.10 Two complex numbers are equal if and only if their 
real and imaginary parts are equal. 


Proof. Two complex numbers with the same real parts and the same 
imaginary parts are clearly the same complex number, so we only need 
to prove this statement in one direction. Letz = a + ib and w = c + id. 
Ifz = w, we will show that their real and imaginary parts are equal. We 
havea +ib = c + id, therefore a — c = i (d — b). Squaring both sides, 
we obtain (a — ©}? = i? (d — b? = —(d — b’. But a — c and (d — b) 
are real numbers, so their squares are non-negative. The only way in 
which this equality can hold is if we have a — c = d — b = 0; that is, 
a=candb=d. 


As a result of this theorem, we can think of the complex numbers geo- 
metrically, as points in a plane. For, we can associate the vector (a, b)! 
uniquely to each complex number z = a + ib, and all the properties of a 
two-dimensional real vector space apply. A complex number z = a + ib 
is represented as a point (a, b) in the complex plane: we draw two axes, 
a horizontal axis to represent the real parts of complex numbers and a 
vertical axis to represent the imaginary parts of complex numbers, as in 
Figure 13.1. Points on the horizontal axis represent real numbers, and 
points on the vertical axis represent purely imaginary numbers. 


Activity 13.11 Plot z=2+2i and w =1—i43 in the complex 
plane. 
13.1.5 Polar form 


If the complex number z = a + ib is plotted as a point (a, b) in the 
complex plane, then we can determine the polar coordinates of this 
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point. We have 
a=rcos6é, b=rsiné6, 


where r = Ja?’ + b? is the length of the line joining the origin to the 
point (a, b), and @ is the angle measured anticlockwise from the real 
(horizontal) axis to the line joining the origin to the point (a, b). Then 
we can write z = a +ib =r cos +ir sin. 


Definition 13.12 The polar form of the complex number z is 
z =r(cos0 + isin). 


The length r = va? + b? is called the modulus ofz, denoted |z|, and 
the angle @ is called the argument of z. 


Note the following properties: 


e zandZ are reflections in the real axis. If 0 is the argument of z, then 
—6 is the argument of Z. 

© |z? =z. 

«e 0 and0 +2nr give the same complex number. 


We define the principal argument of z to be the argument in the range 
—m <0 < x, and it is often denoted Arg(z). 


Activity 13.13 Express z = 2 + 2i, w = 1 — i v3 in polar form. 
Activity 13.14 Describe the following sets of complex numbers: 
(a) gllzI=3} 
(b) {z | Arg(z) = 7/4}. 
Multiplication and division using polar coordinates gives 

zw =r(cos + isin) - p(cosġ +i sind) 

=rp(cos(@ + $) +i sin(6 + ¢)) 
ae " (cos(6 — h) + i sin(0 — ?)). 
w p 

Activity 13.15 Show these by performing the multiplication and the 
division as defined earlier, and by using the facts (trigonometric iden- 


tities) that cos(@ + ¢) = cos 0 cos ọ — sin 0 sing and sin(@ + ¢) = 
sin 0 cos ġ + cos 0 sing. 


We consider explicitly a special case of the multiplication result above, 
in which w = z. Suppose that z = r (cos 0 + i sin 0). If we apply the 
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2 


above multiplication rule to determine z~ = zz, we have 


Z=ZZ 
= (r(cos@ + i sin@))(r(cos@ + i sin@)) 
= r?(cos’ 0 + i? sin? 0 + 2i sin 0 cos 8) 
= r?(cos’ 6 — sin? 6 + 2i sin 0 cos 0) 
= r’°(cos 20 + i sin 28). 
Here we have used the double angle formulae for cos 20 and sin 26. 


Applying the product rule n times, where n is a positive integer, we 
have 


z? —7...7 
T 
n times 


= (r(cos@ + i sin 0))” 


=r" ( cos +++: +60) +i sin +-+: +8)). 
n times n times 
From the two expressions on the right, we conclude DeMoivre 5 formula 
(or theorem). 


Theorem 13.16 (DeMoivre’s theorem) 


(cos 0 + i sin)” =cosné +isinné. 


13.1.6 Exponential form and Euler’s formula 


You may be aware that standard functions of a real variable can often 
be defined by power series (or Taylor or Maclaurin expansions). These 
power series definitions can also be used when the variable is complex. 
In particular, we have 


2 z? 
CERERA ag ag 

z2 z’ 
MASE aye 

GE deiga 
cosz =l- — +~. 


2! 4! 
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If we use the expansion for e7 to expand e’’, and then factor out the real 
and imaginary parts, we find: 


-AN2 A3 “a\4 : 
2 14.49) OE OP CO OF 


2! 3! 4! 5! 
2 3 4 5 
: 6 0 0 
=1+i0 ET af Bi 


02 “al l l 03 05 
TE E 


From this, we may conclude Euler s formula, which is as follows: 
e”? = cos +isinð. 
Using this result, we obtain the exponential form of a complex number. 


Definition 13.17 The exponential form ofa complex number z = a + ib 
is 


PEE 
where r = |z| is the modulus of z and 6 is the argument of z. 
Example 13.18 Using the exponential form, we can write 
e7 +1=0, 

which combines the numbers e, x andi in a single expression. 
If z = re"? , then its complex conjugate is given by Z = re~’”. This is 
because, if z = re’? = r(cos0 + i sin 0), then 

Z =r(cos@ — i sind) = r(cos(—6) + i sin(—0)) = re ™. 


We can use either the exponential form, z = reŻ? , or the standard form, 
z = a + ib, according to the application or computation we are doing. 
For example, addition is simplest in the form z = a + ib, but multi- 
plication and division are simpler when working with the exponential 
form. To change a complex number between re’? and a +ib, we use 
Euler’s formula and the complex plane (polar form). 


Example 13.19 Here are two examples of converting from exponential 
form to standard form: 


l 2 2 1 3 
mra) aE) 
etiv3 = e?e? = e? cos V3 + ie? sin V3. 


13.1 Complex numbers 397 


Activity 13.20 Write each of the following complex numbers in the 
form a + ib: 


illx/6 2—i -3 
: ; 


grit. gre. grrr. e 


Example 13.21 Let z =2+2i =2V/2e4 and w=1-iV3= 
2e3. Then 


w® = (1 —iV3)° = (2e713)6 = 2%" = 64, 
zw = (2V2e!#)(2e713) = 4/2071, Tego., 


w 
You can see that these calculations are relatively straightforward using 
the exponential form of the complex numbers, but they would be quite 
involved if we used the standard form. 


Notice that in Example 13.21, we are using certain properties of the 
complex exponential function, specifically that, if z, w € C, then 


e = ee and (e =e" forn = 1,2,3,.... 


This last property is, in fact, DeMoivre’s theorem, and it is easily gen- 
eralised to include all integers. 

Use of the exponential form sometimes makes solving equations 
easier, as the following example shows: 


Example 13.22 We solve the equation z° = —1 to find the 6th roots 
of —1. 


Writing z = re’’, we have z° = (re!?)® = rfe? , 
8 > 


=r°e and 


—1 = e” = e 0t?) forn € Z. 


So we need to solve 
6ei69 — pit+2nm) 
Using the fact that r is a real positive number, we have r = 1 and 
60 = mx + 2n7,sS0 
m  2nr 


o= 4+, 
6 6 


This will give the six complex roots by taking n = 0, 1, 2,3, 4, 5. 


Activity 13.23 Show this. Write down the sixth roots of — 1 and show 
that any one raised to the power 6 is equal to —1. Show that n = 6 gives 
the same root as n = 0. Use this to factor the polynomial x° + 1 into 
linear factors over the complex numbers, and into irreducible quadratics 
over the real numbers. 
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13.2 Complex vector spaces 


A vector space where the scalars are complex numbers is called a com- 
plex vector space. The following definition is the same as Definition 5.1 
except that the scalars are complex numbers. 


Definition 13.24 (Complex vector space) A complex vector space V 
is anon-empty set of objects, called vectors, equipped with an addition 
operation and a scalar multiplication operation such that for alla, B € C 
andallu,v,we V: 


1. u+ve V (closure under addition). 

2. u+ v = v + u (the commutative law for addition). 

3. u+ (v +w) = (u + v) + w (the associative law for addition). 

4. There is a single member 0 of V, called the zero vector, such that 
forallve V,v +0 =v. 

5. For every v € V, there is an element w € V (usually written as 


—v), called the negative of v, such that v + w = 0. 
6. av E€ V (closure under scalar multiplication). 

7. a(u+v) = «œu + av (distributive law). 

8. (a+ B)v = æv + Bv (distributive law). 

9. a(Pv) = (aB)v (associative law). 

0. Ilv=v. 


Example 13.25 The set C” of n-tuples of complex numbers is a com- 
plex vector space. Just as in IR”, we will write a vector as a column, 


al 
v= n vV €E C; 
Un 


Addition and scalar multiplication are defined component-wise, exactly 
as in R”. 


Example 13.26 The set M2(C) of 2 x 2 matrices with complex entries 
is a complex vector space under matrix addition and scalar multiplica- 
tion. 


Most of the results established in Chapter 3 for real vector spaces carry 
over immediately to a complex vector space V. All that is necessary 
is to change any reference from real numbers to complex numbers. 
A linear combination of vectors has the same meaning, except that 
the coefficients are complex numbers. That is, w € V is a linear 
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combination of v1, V2,..., Vp E€ V if 
W = QV, + a2V2 +++: + AkVk a; €C. 


The concepts of subspace, linear span, linear independence, basis and 
dimension carry over in the same way. Theorems about R” continue to 
hold with R” changed to C”. 


Example 13.27 Suppose that, for i = 1,2,...,n, the vector e; has 
every entry equal to 0 except for the ith, which is 1. Then the vectors 
e€,€2,...,e, forma basis of C”. For any Z = (Z1, 22,...,Zn)' € C”, 


Z = Z1€] + Z2€2 + +++ + Znen. 


The basis {e1, e2, ..., €,} is called the standard basis of C”, and C” is 
an n-dimensional complex vector space. 


Activity 13.28 C” can also be considered as a 2n-dimensional real 
vector space. Why? What is a basis for this space? 


13.3 Complex matrices 


We will refer to a matrix whose entries are complex numbers as a 
complex matrix for short, as opposed to a real matrix (one whose entries 
are real numbers). Sometimes, we will just use the term matrix for either, 
when this will not cause any confusion. If A is an m x n complex 
matrix, then we denote by A the m x n matrix whose (i, j) entry is the 
complex conjugate of the (i, j) entry of A. That is, if A = (a;;), then 
A = (Gj). 

We can use row reduction to solve a system of equations Ax = b, 
where A is an m x n complex matrix, x € C” and b € C”. Results 
concerning the range and null space of a matrix which we estab- 
lished in previous chapters carry over immediately to complex matri- 
ces with the appropriate modifications. The null space is a subspace 
of C” and the range, or column space, of the matrix is a subspace 
of C”. 

The concepts of eigenvector and eigenvalue are the same for com- 
plex matrices as for real ones, and the same method is used to find them. 
In particular, by working in C” rather than IR” we can now sometimes 
diagonalise real matrices with complex eigenvalues, as the following 
example shows: 
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Example 13.29 We find the eigenvalues and corresponding eigenvec- 


tors for the matrix 
0 1 
he = a] 


The characteristic equation is 
—À 1 
—l —ìÀ 
with complex roots A = +i. We now find the corresponding eigenvec- 
tors. 

For à; = i, we solve (A — i/)x = 0 by row reducing the coefficient 
matrix, 


: —i 1 1 i —i 
a-in=(2 “Jo 2 so that vi = (7) 
In the same way, for A. = —i, 
: i 1 l —-i i 
a+in=( 5 >o ‘OF so that v2 = (1). 


We can check that these eigenvectors are correct by showing that Av; = 
iv; and Av» = —iV>. 


|4 —Al| = =)7+1=0, 


Activity 13.30 Check that Av; = iv, and Av = —ivp. 


Can we now diagonalise the matrix A in the same way as for real 
matrices? The answer is yes. If P is the matrix whose columns are the 
eigenvectors, and D is the diagonal matrix of corresponding eigenval- 
ues, we will show that P~!4P = D. We set 


Pala jk “Pala 


and find P~! exactly as with real matrices. We have | P| = —2i, so that 
pka Ool ( 1 -i 
i 


Then 


Activity 13.31 Work through all the calculations in this example. 
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You may have noticed in this example that the eigenvectors of the 
complex conjugate eigenvalues, A; = i and à2 = A; = —i are complex 
conjugate vectors, V2 = V1. This is true in general for real matrices. 


Theorem 13.32 Jf A is an n x n matrix with real entries and if à is 
a complex eigenvalue with corresponding eigenvector v, then i is also 
an eigenvalue of A with corresponding eigenvector V. 


Proof: Since A is a real matrix, the characteristic equation of A is a 
polynomial of degree n with real coefficients, and hence any complex 
roots occur in conjugate pairs. This means that if À is an eigenvalue of 
A, then so is A. If A is an eigenvalue with corresponding eigenvector v, 
then Av = Av. Taking the complex conjugate of both sides, Av = Av, 
which, since A is real, yields AV = AV. This says that V is an eigenvector 
corresponding to À. 


13.4 Complex inner product spaces 


13.4.1 The inner product on C” 


The standard inner product of two vectors x, y in R” is the real number 
(x, y) given by 


(x,y) = X'y =x +22 +++ + Xn Yn: 


The norm of a vector x is given in terms of this inner product by 

I|x|| = v(x, x). This definition of inner product will not work in C”. For 

example, if x = (1,0,i)' € C’, then clearly x 4 0, but we would have 
Ix? =x? +27 443 = +047 =1-1=0. 


It seems we need to alter this definition to make it work in a complex 
vector space. A good guide to what should be done comes from the 
modulus of a complex number. If z = a + ib, then |z|? = zZ = a? + b? 
is a real non-negative number, and |z| = 0 only for z = 0. 


Definition 13.33 For x, y € C”, the standard complex inner product is 
defined to be the complex number (x, y) given by 


(X,Y) = XV + X272 ++ +X. 
Example 13.34 Ifx = (1, 4i,3 +i)! and y = (i, —3, 1 — 2i)", then 


(x, y) = 1(—i) + 4i(-3) + B HN +21) = -i — 12 + +- 74) 
=1-6i. 
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Since 
(x, X) = X1¥1 + X272 bee H Xan = |x? + oval? He + [oxen |? 


is the sum of the squares of the moduli of the components of the vector 
x, the inner product ofa complex vector with itselfis a non-negative real 
number. Then, the norm of the vector is ||x|| = ~v (x, x), and ||x|| = 0 
if and only if x is the zero vector. This last statement is part of the 
following theorem: 


Theorem 13.35 The standard complex inner product 
(X, Y) = XV, H XV ++ +H, (x,y eC”) 
satisfies the following for all x, y, z € C” and for alla, B € C: 


(i) (x, y) = (y, x) 
(li) (ax + By, z) = a(x, z) + Bly, z) 
(iii) (x, x) > 0, and (x, x) = 0 if and only ifx = 0. 


Proof: We have 


(X,Y) = X11 H X2V2 +++ + XnYn 
= V1X1 + Y2X2 +++ + YnXn 
= 1X1 + yoX2 +--+ Ynn 
= y1X1 + 2X2 +++ + YnXn 
= (y, x), 


which proves (i). We leave the proof of (ii) as an exercise. For (iii), note 
that if x; = a; + ib; is the jth component of x, then 
(x, x) = [or]? + baal? +++ + baal? 
=a +b +a +b +o +a +o? 
is a sum of squares of real numbers, so (x, x) > 0, and (x, x) = 0 if and 


only if each term a; and b? is equal to 0; that is, if and only if x is the 
zero vector, x = 0. 


Activity 13.36 Prove property (ii). 


Activity 13.37 Calculate the norm of the vector x = (1, 0, i)T € C’. 


13.4.2 Complex inner product in general 


As with real vector spaces, there is a general notion of inner product on 
complex vector spaces. 
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Definition 13.38 (Complex inner product) Let V be a vector space 
over the complex numbers. An inner product on V is a mapping from 
(or operation on) pairs of vectors x, y to the complex numbers, the 
result of which is a complex number denoted (x, y), which satisfies the 
following properties: 


© (x,y) = (y, x) forallx,y € V. 
(ii) (ax+ By, z) =a(x,z)+ Bly,z) for all x,y,ze V and all 
a,BeEC. 
(iii) (x, x) > 0 is a real number for all x € V, and (x, x) = 0 if and 
only if x = 0, the zero vector of the vector space V. 


A vector space with a complex inner product is called a complex inner 
product space. From any complex inner product, we can define a norm 
by 


Ix|| = v (x, x). 


The inner product defined on C” in the previous section is clearly an 
inner product under this general definition. 

Two further properties, which follow directly from this definition, 
are: 


e (x, &y) = @(x, y) for all x,y € V andalla € C. 
e (x,y+z) = (x,y) + (x, z) forall x,y,z € V. 


Activity 13.39 Use the definition to prove these two additional 
properties. 


Example 13.40 (This is a complex version of Example 10.3.) Suppose 
that V is the vector space consisting ofall complex polynomial functions 
of degree at most n; that is, V consists of all functions p : x œ> p(x) 
of the form 


p(x) = ap + ayx + ax? +- + anx”, do, đ1,..., an E C. 


The addition and scalar multiplication are, as before, defined pointwise. 
Let x1, X2, ...,Xn+1 be n + 1 fixed, different, complex numbers, and 
define, for p,q E€ V, 


n+1 


(p,q) = D0 p(xi)q@i). 


i=1 


Then this is an inner product. To see this, we check the properties in 
the definition of an inner product. Property (i) follows from properties 
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of complex numbers and the complex conjugate: 


n+l n+l n+l 


(p,q) = > pa = >> EE = >> g(x) pi) = la, p). 


i=l i=l i=l 


For (iii), we have 


n+l n+l 


(P; P) = >> pi) p@) = >> pæl? = 9, 


i=l i=l 


since it is the sum of squares of real numbers. The rest of the argument 
proceeds in exactly the same way as before. If p is the zero vector of the 
vector space (which is the identically-zero function), then (p, p) = 0. To 
complete the verification of (iii), we need to check that, if (p, p) = 0, 
then p must be the zero function. Now, (p, p) = 0 must mean that 
p(x;) = 0 fori =1,2,...,n+1, so p(x) has n+ 1 different roots. 
But p(x) has degree no more than n, so p must be the identically-zero 
function. (The fact that a non-zero polynomial of degree n has no more 
than n distinct roots is just as true for complex numbers as it is for real 
numbers.) As before, part (ii) is left to you. 


13.4.3 Orthogonal vectors 


The definition of orthogonal vectors for real inner product spaces carries 
over exactly to complex ones: 


Definition 13.41 Two vectors x, y in a complex inner product space 
are said to be orthogonal if 


(x,y) = 0. 
We write x L y. 


A set of vectors {v1, V2, ..., Vn} in a complex inner product space V is 
orthogonal if (v;, v;) = 0 fori ¥ j. It is orthonormal if each vector is 
also a unit vector; that is, (v;, v;) = 1. 

Just as in R”, an orthogonal set of non-zero vectors in C” is linearly 
independent. The proof is essentially the same as the one given for 
Theorem 10.14, but we state and prove it for a complex inner product 
space to illustrate the modifications. Notice that it is useful to think 
ahead about the order in which we choose to place the vectors in the 
inner product so that the proof is as straightforward as possible. 
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Theorem 13.42 Suppose that V is a complex inner product space and 
that vectors V1, V2, ..., Vk E€ V are pairwise orthogonal ((v;, v;) = 0 
fori + j), and none is the zero-vector. Then {V,, V2, ..., Vx} is a linearly 
independent set of vectors. 


Proof. We need to show that if 


OV) + 2V2 +--+: + av; = O, a; € C, 
then œ; = a, =--- =a, = 0. Let i be any integer between | and k. 
Then 

(QV, + 2V2 +--+ + AkVk, Vi) = (0, v;) = 0. 
But 


(&1V1 + +++ + OKVE, Vi) 
= (V1, Vi) + +++ + Qi (Vi—1, Vi) + Oj (Vi, Vi) 
+ O41 (Viti, Vi) $+ ++ + aK (VE, Vi). 
Since (v;, v;) = 0 for j 4 i, this equals a; (v;, v;), which is @;||v;||?. So 
we have @;||v; ||? = 0. Since v; Æ 0, ||v; ||? 4 0 and hence a; = 0. But i 
was any integer in the range | to k, so we deduce that 


æi = Q2 =- - = Qk = 0, 


as required. 


Example 13.43 The vectors 


gd l /i 
na) aO 
form an orthonormal basis of C?. To show this, we calculate the inner 
products. They are orthogonal since 


(Vi, v2) =5((;) o = HU) + i(1)) = 0 


and the norm of each vector is 1 since we have 


m= (= oroen: 


md waw = GG OE1. 


It is a basis of C? since they are linearly independent and C? has 
dimension 2. 


If {v1, V2, ..., Vg} is a basis of a complex inner product space V, then, 
just as for a real vector space, we can apply the Gram-Schmidt orthonor- 
malisation process to obtain an orthonormal basis of V. 
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Example 13.44 Let V be the linear span, V = Lin{vj, v2}, of the vec- 
tors v1, V2 in C”, where 


1 1 
"=i "= 2 ) 
0 Le 


We will find an orthonormal basis for V. First, we find a unit vector par- 
allel to vı. Since (vj, vi) = 1(1) + i(—7) = 2, we set u; = (1//2)v1. 
Then we set 


m(2) (C2) aa) 


Calculating the inner product, we have 


l 1—2; f! 
W = 2 ~ 7 l 
l4i 0 


1 ti sti 
-| 2 J- (g) = (1-4). 
1+i 0 1+i 


We need a unit vector in this direction. To make calculations easier, we 


use the parallel vector 
1+ 27 
w2 = 2—i ) : 
2 +2i 


and check that W2 L vı: 
(W2, v1) = (1 + 21)(1) + (2 —i)(—i) + (2 + 2i\(0) = 0. 
We find ||w2|| = v 18, so that 


1 (i) 1 (251) 
u; = —|i7 |}, Ww = ——| 2-i 
v2 \o 3V2 (242 


form an orthonormal basis of V. 

Suppose we now wish to find the coordinates of v2 in this new basis. 
As in R” (see Theorem 10.20), the coordinates a; of any vector v € V 
with respect to the orthonormal basis {u;, u2} are given by a; = (v, u,). 
Here the order is important, because (u;, v) = a;. We have 


a=(( 2 hG 

o 14i V2\o// ~x 
1 142i 

n= (( > paalz) 
iti) 3V2 \242; 3V2 2 
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so that 
1 —2i 3 
af 2. 


Activity 13.45 Check all the calculations in this example. 


w= 


13.5 Hermitian conjugates 
13.5.1 The Hermitian conjugate 


If A is a complex matrix, the Hermitian conjugate, which we denote by 
A*, is the matrix T, the result of taking the complex conjugate of every 
entry of A and then transposing the matrix. Whereas the transpose of a 
real matrix played an important role in the orthogonal diagonalisation 
of real matrices, we shall see that it is the Hermitian conjugate which 
we need for complex matrices. 


Definition 13.46 (Hermitian conjugate) If Ais an m x n matrix with 
complex entries, then the Hermitian conjugate of A, denoted by A”, is 
defined by 

At =A. 
That is, if A = (a;;), then A = (qj) and A* = A’ = (az). 
Example 13.47 


= 3 
if 4=(; i jee then aa (se 1-2 í 
: i 2—i 4+9i 


If x, y are vectors in C”, then we can express the standard complex 
inner product in terms of matrix multiplication as 
(X,Y) = x171 X272 +++ F XnYn =Y1X1 + Vox. +--+ + YnXn = YX. 


Unfortunately, this is not quite as neat as the corresponding expres- 
sion for the inner product on R” as a matrix product. (How are they 
different?) However, we do have. 


2 
(x, x} = x*x = [[xll°. 


Compare these properties of the Hermitian conjugate of a matrix with 
those of the transpose of a real matrix: 


(4 = A, (A+ By = A* + B*, (AB)* = B*A*. 


Because the two operations involved in forming a Hermitian conjugate — 
taking the conjugate and taking the transpose — commute with each other 
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(meaning it doesn’t matter in which order you perform the operations), 
the first two properties follow immediately from the definition of the 
Hermitian conjugate. Let us look more closely at the last property. 

We will prove that (A B)* = B* A* by showing that the entries are 
the same. If A = (a;;) and B = (b;;), the (i, j) entry of AB is 


aiibij + aizbaj + +++ + dinbnj, 
so the (j, i) entry of (AB)* is 


aiibi; + ainda; +++ + dinbnj. 


Now look at the (j, 7) entry of B* A*, which is the matrix product of the 
jth row of B* with the ith column of A*. The jth row of B* is given 
by the complex conjugate of the jth column of B, (bij, b2;,..., bn), 
and the ith column of A* is the complex conjugate of the ith row of A, 
which is (@;, @;2, ..., din)". Thus, the (j, i) entry of B* A* is 
bijā + bo jig +++- + bnjGin, 

which is equal to the expression we obtained for the (j,i) entry of 
(AB)*. 

If A is a matrix with real entries, then A* = A =A". Therefore, 


the proof we have just given includes the familiar result for real matrices 
A and B, that (AB)! = BTA". 


Activity 13.48 Show that for any complex matrix A and any complex 
number k, 


(kA)* = kA*. 
What is the analogous result for real matrices and real numbers? 


Often, the term adjoint is used instead of Hermitian conjugate. But we 
have already used that terminology for something completely different 
in Chapter 3 (in the context of finding the inverse of a matrix), so we 
will avoid it: but we wanted to let you know, to avoid confusion. 


13.5.2 Hermitian matrices 
Recall that a real matrix A is symmetric if A = A". The complex 
analogue is a Hermitian matrix. 


Definition 13.49 (Hermitian matrix) An n x n complex matrix A is 
Hermitian if and only if 


A= A". 
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A Hermitian matrix with real entries is a symmetric matrix, since A = 
A* = A". So what does a Hermitian matrix look like? If A = (aj) is 
equal to A* = (aj), then the diagonal entries must be real numbers, 
since they satisfy a;; = a;;. The corresponding entries across the main 
diagonal must be complex conjugates of one another. 


Example 13.50 The matrix 


1 142i 4-i 
A= ( — 2i —3 i 
4+i —i 2 
is a Hermitian matrix. 


Activity 13.51 Check that 4* = A = A. 


When we looked at orthogonal diagonalisation of symmetric matrices, 
we stated (in the proof of Theorem 11.5) that the eigenvalues of a 
symmetric matrix are real. We can now prove this. The result is a 
corollary of the following theorem: 


Theorem 13.52 Jf A is a Hermitian matrix, then the eigenvalues of A 
are real. 


Proof. Suppose à is an eigenvalue of A with corresponding eigenvector 
v. Then Av = Av and v Æ 0. We multiply this equality on the left by 
the Hermitian conjugate of v, obtaining 


v* Av = V*Av = Aviv = Allyl’, 


where the norm of v is a positive real number. On the other hand, taking 
the complex conjugate transpose of both sides of Av = Av, we have 


(Av)" = (v)", 
which gives 
vi A* = Av". 
We then multiply this last equality on the right by v to get 
v* A*y = Av*v = Ally’. 
Since A is Hermitian, v* Av = v* A*v, and therefore it follows that 


alivi? = Alivi. 


Since ||v||? 4 0, we conclude that 2 = A; that is, À is real. 


This has as an immediate consequence the following important 
fact that we used in Chapter 11 to prove the Spectral theorem 
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(Theorem 11.5); that is, Theorem 11.7 is the following corollary of 
Theorem 13.52: 


Corollary 13.53 [f A is a real symmetric matrix, then the eigenvalues 
of A are real. 


As with real symmetric matrices, it is also true for Hermitian 
matrices that eigenvectors corresponding to different eigenvalues are 
orthogonal. 


Theorem 13.54 Ifthe matrix A is Hermitian, then eigenvectors corre- 
sponding to distinct eigenvalues are orthogonal. 


Activity 13.55 Prove this theorem. Look at the proof of Theorem 11.8 
and rework it for the complex case. 


13.5.3 Unitary matrices 


The counterpart for complex matrices to an orthogonal matrix is a 
unitary matrix. 


Definition 13.56 An n x n complex matrix P is said to be unitary if 
and only if PP* = P* P = J; that is, if P has inverse P*. 


Example 13.57 The matrix 


r= (3 A) 


is a unitary matrix, 
Activity 13.58 Check this. 


An immediate consequence of this definition is that if P is a unitary 
matrix, then so is P*. 


Activity 13.59 Show this. Show, also, that if 4 and B are unitary 
matrices, then so is their product AB. 


A unitary matrix P with real entries is an orthogonal matrix, since then 
P* = P'. Recall that a matrix is orthogonal if and only if its columns 
are an orthonormal basis of R”. We prove the analogous result for 
unitary matrices. 


Theorem 13.60 Then x n matrix P is unitary ifand only if the columns 
of P are an orthonormal basis of C". 
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Proof: The proof of this theorem follows the same argument as the 
proof of Theorem 10.21. It is an ‘if and only if’ statement, so we must 
prove it in both directions. 

Let x1, X2,...,X, be the columns of the matrix P. Then the rows 
of P* are the complex conjugate transposes of these vectors. 

If 7 = P*P, we have 


10.. 0 xi 

01. 0 x5 

o fs (X1 Xp +++ Xa) 

0 0 --. 1 x 
XiX1  X}X2 te X}Xp 
XŠX1 XX2 =e XXn 
X*XI XX2 © XXn 


Equating the entries of these matrices, we have x7x; = (x;, x;) = 0 if 
i A j and x*x; = (x;,x;) = 1 ifi = j, which means precisely that the 
columns {x;, X2,...,X,} are an orthonormal set of vectors. They are 
therefore linearly independent, and since there are n of them, they are 
a basis of C”. 

Conversely, if the columns of P are an orthonormal basis of C”, then 
the matrix product P* P as shown above must be the identity matrix, so 
that P*P = I. This says that P* = P—', so also PP* = J. 


Since P* is also unitary, this result applies to the rows of the matrix P. 

Just as for an orthogonal matrix, the linear transformation defined 
by a unitary matrix P is an isometry, meaning that it preserves the inner 
product, and therefore the length of any vector. In fact, this characterises 
a unitary matrix, as the following theorem shows: 


Theorem 13.61 The matrix P is unitary if and only if the linear trans- 
formation defined by P preserves the standard complex inner product; 
that is, (Px, Py) = (x, y) for all x, y € C”. 


Proof: If P is a unitary matrix, then 


(Px, Py) = (Py) (Px) = y*P* Px = y* Ix = yřx = (x, y), 


so P preserves the inner product. 
Conversely, assume we have a matrix P for which 


(Px, Py) = (x, y) 


for all x, y € C”. Let {e,,e2,...,e,} denote (as usual) the standard 
basis on C”. Then Pe; = v;, where v1, V2,..., V, are the columns 
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of P. We have 
(vi, Vj) = (Pe;, Pej) = (e;, e;), 


from which we deduce that the columns of P are an orthonormal basis 
of C”, and therefore P is a unitary matrix. 


13.6 Unitary diagonalisation and normal matrices 


Recalling the definition of orthogonal diagonalisation (Section 11.1), 
the following definition will come as no surprise: 


Definition 13.62 A matrix A is said to be unitarily diagonalisable if 
there is a unitary matrix P such that P* AP = D, where D is a diagonal 
matrix. 


Suppose the matrix A can be unitarily diagonalised, with P*AP = D. 
Since the matrix P diagonalises A, the columns of P are a basis of C” 
consisting of eigenvectors of A. Since P is unitary, the columns of P 
are an orthonormal basis of C”. That is, if P unitarily diagonalises A, 
then the columns of P are an orthonormal basis of C” consisting of 
eigenvectors of A. 

Conversely, if the eigenvectors of A are an orthonormal basis of 
C”, then the matrix P whose columns are these basis vectors is unitary. 
Since the vectors are eigenvectors of A, we have AP = PD, where 
D is the diagonal matrix of corresponding eigenvalues. Since P~! = 
P*, we have P*AP = D, so that A is unitarily diagonalised. This is 
summarised in the following theorem: 


Theorem 13.63 The matrix A can be unitarily diagonalised if and only 
if there is an orthonormal basis of C” consisting of eigenvectors of A. 


For real matrices, only a symmetric matrix can be orthogonally 
diagonalised. Considering what we have done so far, it is natural to ask 
if there is an analogous result for complex matrices, but the result for 
complex matrices is quite different. Whereas it is true that a Hermitian 
matrix can be unitarily diagonalised, these are not the only matrices for 
which this is true. There is a much larger class of complex matrices 
which can be unitarily diagonalised. These are the normal matrices. 


Definition 13.64 (Normal matrix) An n x n complex matrix A is 
called normal if 


AA* = A*A. 


13.6 Unitary diagonalisation and normal matrices 413 


Every Hermitian matrix is normal since AA* = AA = A*A. Also, 
every unitary matrix is normal since AA* = J = A*A. 

Furthermore, every diagonal matrix is normal. To see this, let D = 
diag(d;,...,d,), meaning d; is the entry in the (i, i) position and all 
other entries are zero. Then D* is the diagonal matrix with d; in the 
(i, i) position and zeros elsewhere. Therefore, 


DD* = diag(|d;|*, |dz|?, ..., (dp) = D*D. 


This shows that D is normal, and also that the entries of the diagonal 
matrix D D* are real. Diagonal matrices provide some simple examples 
of matrices that are normal, but neither Hermitian nor unitary. 


Activity 13.65 Write down a diagonal matrix which is not Hermitian 
and not unitary. 


We state the following important result: 


Theorem 13.66 The matrix A is unitarily diagonalisable if and only if 
A is normal. 


We will prove this theorem in one direction only: if A is unitarily 
diagonalisable, then A is normal. This means that only normal matrices 
can be unitarily diagonalised. The proof that if A is normal then A 
can be unitarily diagonalised requires additional theory and will not be 
given in this book. 


Proof: [that only normal matrices can be unitarily diagonalised.] Sup- 
pose A can be unitarily diagonalised. Then there is a unitary matrix P 
and a diagonal matrix D such that P*A P = D. Solving for A, we have 
A = PDP*. Then 
AA* =(PDP*)(PDP*)* =(PDP*)\(P D* P*) = PD(P* P)D* P* 
= P(DD*)P*. 
In the same way, 
A*A =(PDP*)*(PDP*) =(PD*P*)(PDP*) = PD*(P* P)DP* 
= P(D*D)P. 
Since D is diagonal, it is normal, so that P(DD*)P* = P(D*D)P, 
from which we conclude that A is normal. 


How do we unitarily diagonalise a normal matrix A? We carry out 
the same steps as for orthogonal diagonalisation. First, we solve the 
characteristic equation of A to find the eigenvalues. For each eigenvalue 
à, we find an orthonormal basis for the corresponding eigenspace, using 
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Gram-Schmidt if necessary. Then the set of all such eigenvectors is an 
orthonormal basis of C”. That this is always possible is the content of 
the above theorem. We form the matrix P with these eigenvectors as the 
columns. Then P is unitary, and P* AP = D, where D is the diagonal 
matrix of corresponding eigenvalues. 

All the examples you have seen of orthogonal diagonalisation are 
examples of unitary diagonalisation in the case where A is a real sym- 
metric matrix (Section | 1.1). We now give an example for a complex 
matrix. 


Example 13.67 The matrix 


is Hermitian and can therefore be unitarily diagonalised. The eigenval- 
ues are given by 

l—àÀ 2+i 
2—i 5-A 
So the eigenvalues are 0 and 6. (As expected, these are real numbers.) We 
now find the corresponding eigenvectors by row reducing the matrices 


|4 -A11 = | =17-644+5-5=0. 


(A —AlI). 
For 4; = 0, 
= 1 TA “a 
Nae 5 0 0 ? 
so we let 
z Ta 
1> =] s 
For Az = 6, 


a e a 


so we may take 


as an eigenvector. These two eigenvectors are orthogonal. The vector 
vı has ||v; ||? = 6. The vector vz has norm equal to 30. If we set 


(Ce Tare) P=(9 6): 


then P*AP = P-'AP = D. 


Activity 13.68 Check all the calculations in this example. 
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We have already seen that all Hermitian matrices are normal. We also 
know that the eigenvalues of a Hermitian matrix are real. So all Hermi- 
tian matrices are normal, with real eigenvalues. We can now prove the 
converse; namely, that if a normal matrix has real eigenvalues, then it 
is Hermitian. 


Theorem 13.69 Let A be a normal matrix. If all of the eigenvalues of 
A are real, then A is Hermitian. 


Proof: Since A is normal, it can be unitarily diagonalised. Let P be a 
unitary matrix such that P* AP = D, where D is the diagonal matrix 
of eigenvalues of A. Since A has real eigenvalues, D* = D. Then 
A= PDP* and 


A* =(PDP*)* = PD* P* = PDP* = A, 


which shows that A is Hermitian. 


13.7 Spectral decomposition 


We will now look at the unitary diagonalisation of a matrix A in a 
different way. Let P be a unitary matrix such that P* A P = D, where 
D is the diagonal matrix of eigenvalues of A, and the columns of P are 
the corresponding eigenvectors, X1, X2,...,X,. Since P is unitary, we 
have P*P = PP* = I. We used the equality P* P = J in the proof of 
Theorem 13.60 to show that the column vectors of P are an orthonormal 
basis of C”. That is, 


* * * * 
x} XXI XX. ++: X}Xy 
app x5 ( ) X5X1 XX2 +++ XXn 
= E X1 X2 06 Xa) = . : : i , 
* * * * 
x% XŽXI XX2 +++ XX, 


where the entry x¥x; = (Xj, X;) is a complex number (either 1 or 0 in 

this case). But what information can we derive from the other product, 

P P* = I in terms of the column vectors? Carrying out the matrix 

multiplication, we have 

x} 
x 


X2 
I = PP* = (X1 X% «++ X) . = XIX] + ox, + e H Xn XŠ, 
xX; 
where this time E; = x;x* is an n x n matrix. It is the matrix product 
of the n x 1 column vector x; with the 1 x n row vector x;. Using the 
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matrices E;, the above equality can be written as 
T= E,+E,+---+ Ey. 


This result is true for any unitary matrix P, but it is most interest- 
ing when the columns of P are the eigenvectors of a matrix A. The 
connection with the matrix A is the following theorem: 


Theorem 13.70 (Spectral decomposition) Let A be a normal matrix 
and let {x,, X2, ..., Xn} be an orthonormal set of eigenvectors of A with 
corresponding eigenvalues h1, h2,..., An. Then 


A = )1X1Xq + A2X2X5 +++ + AnXnX; 


Proof: If P is the unitary matrix whose columns are the eigenvectors 
of A, then, as we have seen, we can write 


T= xx} + X2X5 +--+ +%,x;, 
Multiplying both sides of this equality by the matrix A, we have 


A = AI = A(x) x} + X25 +--+ + Xn X} ) 
= Ax)x} + AXX} +--+ + Axnx, 
= AIXI X] + À2X2X} +--+ + Anka). 


The spectral decomposition of A is the formula 
A =A, Ey +E + +++ +AnEn, 
where E; = x;x7. 


Example 13.71 In Example 13.67, we unitarily diagonalised the matrix 


We found P*4P = P~!AP = D for the matrices 


r= (ORM ORM) me o= 3). 


Therefore, the spectral decomposition of A is A = 0E; + 6£». Clearly, 
we only need to calculate the matrix E2 = x2x3. We have 


als Jen 5) 5 (sen as) 


a 


A= 
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as expected. So the spectral decomposition of this matrix is just the 
matrix A itself. 


Example 13.72 In Example 11.14, we orthogonally diagonalised the 


matrix 
211 
a=(1 5 
E ao 
If we let 
ji 1 1 
Je A B 1 0 0 
P=|-% 2 3 and D=|0 1 0], 
2 1 


then PT = P~! and PTBP = D. 

The spectral decomposition of B is B = 1E; + 1E? + 4E3, where 
the matrices E; = x;x; are obtained from the orthonormal eigenvectors 
of B (which are the columns of P). We have, 


ETE gi i 
E= 1 1 1 2)=- 1 1 —2 
(7) (5!) 1 2 a) 


and, similarly, 


pi 1 0 T ER 
n=- 1 o). n=; (i 1 i); 
2\o 0 0 SU ae ie 


The spectral decomposition of B is 


ji 1 -2) yf -1 OY gfl 1 
Fee) i Tf laij- i p (i l i); 
6 l-2 2 4 27\0 0o of $U 11 


Activity 13.73 Check all the calculations for this example. 


Let’s take a closer look at the matrices E;. As the following theorem 
shows, we will find that they are Hermitian, they are idempotent and 
they satisfy E;E; = 0 if i 4 j (where 0 denotes the zero matrix). 


Theorem 13.74 /f {x,, X2, .. . , Xn} is an orthonormal basis of C”, then 
the matrices E; = x;x; have the following properties: 

() Ef = Ei. 

(ii) E? = E;. 
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Proof. We will prove (ii), and leave properties (i) and (iii) for you to 
verify. The matrices are idempotent since 


E? = E; E; = Gx) (a7) = axe, = X; (Xi, Xi) X} 


=x; lx = £;. 


The other two proofs are equally straightforward. 


Activity 13.75 Show that (i) E; is Hermitian and that (i11) E;E; = 0 
ifi £ j. 


The fact that each Æ; is an idempotent matrix means that it represents 
a projection (just as for real matrices). To see what this projection is, 
look at its action on the orthonormal basis vectors 


x= (x;X; )X; = X;(X;X;) = X; (Xi, X) =x, 1 = X; 
and 
Eix; = (xxx; = xax) =x; -0=0. 


Ifv is any vector in C”, v can be written as a unique linear combination 
V = 41X] + 42X2 +- -© + anXn. Then 
Ew = E;(a1X1 + a2X2 + +++ + anXn) 
= a, E;Xı + ag EjX2 + +++ + ai-1 EiXi-1 + aE ix; 
+ aj41 EiXi41 +++ + anEiXn = di;Xi. 


E; is the orthogonal projection of C” onto the subspace spanned by the 
vector X;. 


Activity 13.76 Look at the previous example and write down the 
orthogonal projection of C” onto the subspace Lin{(1, 1, 1)"}. 


Matrices which satisfy properties (ii) and (iii) of Theorem 13.74 have 
an interesting application. Suppose E1, E2, E3 are three such matrices; 
that is, 


= E; for i=] 
BE; = 4" for ix/j 


for i = 1, 2,3. Then for any real numbers a, a2, @3, and any positive 
integer n, we will show that 


(a, Ei +a. FE, +033)" = ay Ey + œ) E2 + a3 E3. 
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To establish this result, we will use an inductive argument. For n = 2, 
observe that 


(0 E1 + aE, +0363) 
= (a E1 + &2E2 + 03£3)(a Ey + &2E2 + 0343) 
= af E1 E; +05E2E) + 04E; E3 (since E;E; = 0 fori # j) 
= at Ey + a3 E> + a3 E3 (since E;E; = Ej). 


Now assume that the result holds for n, 
(a, E1 +a. F2 + &3 E3)” = ay Ey + ats E> + a3 E3. 


We will show that it therefore also holds for n + 1. In this way, the result 


for n = 2 above will imply the result is also true for n = 3, and so on. 
We have 


(0 E1 + @2E2 + a3E3)"*" 
= (a, EF, + a2 EF, + &3E3)" (&1 E1 + a2 Ey + 0343) 
= (a Ey + as E> + ats E3)(a Ey +E + a3F3) 
= ot! E E1 +3" By E) + oft! E3 E3 
(since E; E; = 0 fori ¥ j) 
= altl E + ast! E, + ant! E3 
(since E; E; = E;). 


Example 13.77 Continuing with the Example 13.72, for the matrix 


2 1 1 
s=(i 2 i); 
1 1 2 


we have the spectral decomposition B = E1 + E? + 4E3, given by 


TIE TEE a id 
s= 1 2) +3(-1 1 oai 1 i) 
l-a <9 4 2\o o of 32\1 11 


Suppose we wish to find a matrix C such that C? = B. According 
to this result, if we set C = a, E1 + a) FE, + a3£3, for some constants 
G1, Q2, @3 to be determined, then 


C? = 07 FE, +a3E) + a3 E3 = B = Ey + Ey + 4E3. 
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The constants a; = 1, a2 =1 and a3 = 2 will give an appropriate 
matrix C: 


1 1 4 Fa. -10 E E 
Esnal w abea hy Ge lis Mt 
Choe -2 4 2\o 0 0 bot 1 


Activity 13.78 Calculate C and show that C? = B. 


13.8 Learning outcomes 


You should now be able to: 


e use complex numbers, understand the three forms of a complex 
number and know how to use them 

e explain what is meant by a complex matrix, a complex vector space, a 
complex inner product, a complex inner product space and translate 
results between real ones and complex ones 

e diagonalise a complex matrix 

e find an orthonormal basis of a complex vector space by applying 
the Gram-Schmidt process if necessary 

e state what is meant by the Hermitian conjugate of a complex matrix, 
a Hermitian matrix and a unitary matrix 

e show that a Hermitian matrix has real eigenvalues, and that eigen- 
vectors of a Hermitian matrix corresponding to distinct eigenvalues 
are orthogonal 

e demonstrate that you know how to show that a matrix is unitary 
if and only if its columns are an orthonormal basis of C” and if 
and only if the linear transformation it defines preserves the inner 
product 

e state what it means to unitarily diagonalise a matrix 

e state what is meant by a normal matrix and show that Hermitian, 
unitary and diagonal matrices are normal 

e unitarily diagonalise a normal matrix and show that only normal 
matrices can be unitarily diagonalised 

e explain what is meant by the spectral decomposition of a normal 
matrix and find the spectral decomposition of a given normal matrix 

e demonstrate that you know the properties of the matrices E; in a 
spectral decomposition, and that you can use them to obtain an 
orthogonal projection from C” onto a subspace, or to find a matrix 
B such that B” = A (n = 2,3,...) for a given normal matrix A. 


13.9 Comments on activities 421 


13.9 Comments on activities 


Activity 13.9 We have 

(x — w)(x — T) =x? — (w + W)x + wọ. 
Now, w + W = 2Re(w) = 2(— 5) and ww = 1 + i so the product of 
the last two factors is x? +x + 1. 


Activity 13.11 The points plotted in the complex plane are as follows: 


2i+ ez=2+2i 
i+ 
0 1 2 
—74 
: ew=1- iV3 
24 


Activity 13.13 Draw the line from the origin to the point z in the 
diagram above. Do the same for w. For z, |z| = 2/2 and 6 = F> SO 


z= 2/2( cos(4) +isin(7)). The modulus of w is |w| = 2 and the 
argument is — 7, so that 


w=2(05(~$) +150(~)) #2(09(5) E) 


Activity 13.14 The set (a) consisting of z such that |z| = 3 is the circle 
of radius 3 centered at the origin. The set (b), for which the principal 
argument of z is 7/4, is the half line from the origin through the point 


(1, 1). 


Activity 13.20 We have 


; ; 1 1 
eit/2 =i, e372 = i, eidt/4 DETEN see) 
V2 v2 
ei(llx/6) — ,—i(x/6) — v3 = Ta 
2 2 
ei = ee! = e? cos(1) — i e? sin(1). 


Finally, e~> is real, so it is already in the form a + ib. 
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Activity 13.23 The roots are: 


zjy=e's, z=e's, z3=e'6, 
iz i22 juz 
Za=e's, zZe=e'o, Zz=e' 6 


> —i Z > —i 


Z4=273=e 6, Z5 =Z: =e : Z =7 =e 


NJ] 


The polynomial factors as 


xÉ +1 = (x — 21) — Z) — 22) — Zax — 23)( — Z3). 


Using the a+ ib form of each complex number, for example, zı = 
xi +i 1, you can carry out the multiplication of the linear terms pair- 
wise (complex conjugate pairs) to obtain x° + 1 as a product of irre- 
ducible quadratics with real coefficients, 


x64 1 = (xe? —V3x4- 10? 4+ V3 x 1074 2). 


Activity 13.28 Any vector v € C” can be separated into real and imagi- 
nary parts and written as v = a+ ib, where a, b € R”. By only allow- 
ing real numbers as scalars, this space has a basis consisting of the n 
vectors {€1, @2,..., €n} together with n vectors {u;, U2,..., Un}, where 
u; is the vector with every entry equal to 0, except for the jth entry 
which is equal to the purely imaginary number i. Hence, as a real vector 
space, C” has dimension 2n. 


Activity 13.30 We check Av; =iv, and Áv = —ivo: 


(3 G) =G) =) 
(A, o) G)= (4) 4G). 


Activity 13.36 We have 


and 


(ax + By, Z) = (ax, + Byi)zZ1 + (ax. + By2)Z2 + +++ + (Xn + BYn)En 
= (121 + X222 +--+ + XnZn) + PVII + y2Z0 + +++ + YnZn) 
= a(x, Z) + (y, Z). 
Activity 13.37 The norm of x = (1, 0,7)" is V2 because 
Ixl? = (x, x) = 111) +0 + ()\(—i) = 2. 
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Activity 13.39 To show that (x, wy) = @(x, y) for all x,y € V and all 
a € C, we use properties (i) and (ii) of the definition, 


(x, wy) = (ay, x) = a (y, X) = Q (y, x) = Q (X, y). 


That (x, y + z) = (x, y) + (x, z) forall x, y, z € V is proved ina similar 
way using properties (i) and (11) of the definition. We have 


(x, y + Z) = (y +z, x) = (y, xX) + (z, x) = (y, x) + (Z, x) 
= (x, y) + (X, Z). 
Activity 13.48 If A = (aij), then 
(KA)* = (kaji) = (kap) = kan) = kA". 
The analogous result for a real matrix A and a real number k, is simply 


(kA)? = KAT. 


Activity 13.55 Suppose that à and u are any two different eigenvalues 
of A and that x, y are corresponding eigenvectors. Then Ax = Ax and 
Ay = py. Then 


y* Ax = y*Ax = (Ax, y) = A(x, y). 
On the other hand, since A = A* 
y Ax = y*A*x = (Ayx = (wy)"x = Lyx = H(X, y). 


But the eigenvalues of a Hermitian matrix are real (Theorem 13.52), so 
T = p, and we can conclude from these two expressions for y* Ax that 


A(X, y) = W(X, y), Or 

(A — u)(x, y) = 0. 
Since A Æ u, we deduce that (x, y) = 0. That is, x and y are orthogonal. 
Activity 13.59 From the definition, PP* = P*P = I. Since (P*)* = 


P, we have (P*)* P* = P*(P*)* = I so P* is unitary. 
If A and B are unitary matrices, then 


(AB)(AB)* = ABB*A* = AA =I, 
and, similarly, (4 B)*(A B) = I, which shows that AB is unitary. 


Activity 13.65 The matrix 
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(which is normal since it is diagonal) is not Hermitian since the diagonal 
entries are not real, and it is not unitary since 4 A* Æ I. 
Activity 13.75 To prove (i) £ = E;, we have 
Ef = (xix}) = (XhY*x} = xix} = Ey, 
For (iii), if i Æ j, then 


EE; = (Xx A) = XP X AX; = Xi (Kj, XX; =X; -0-X; = O. 


Activity 13.76 The orthogonal projection is given by the matrix 
E3 = X3X3, where x3 is the unit vector parallel to (1, 1, 1)'. That is, 
P : Œ > Lin{(1, 1, 1)"} is given by P(v) = E3v, where 


,/i 11 
n=; (1 1 i) 
3At 4 4 


Compare this with the method you learned in the previous chapter, 
which will, of course, give you the same solution. 


Activity 13.78 The matrix C is 


1/411 
SHESI 
3U 14 


The calculation of C? is straightforward, and C 2 — B. 


13.10 Exercises 


Exercise 13.1 Find the four complex roots of the equation z+ = —4 
and express them in the form a + ib. Use these to write zf +4 asa 
product of quadratic factors with real coefficients. 


Exercise 13.2 Suppose the complex matrix A and vector b are as 


follows: 
1 i 1 
(Gih A) =C) 


Calculate the determinant of A, |A|. Find the solution of the system of 
equations Ax = b. 


Exercise 13.3 Consider the real vector space M>(R) of 2 x 2 matrices 
with real entries and the complex vector space M>(C) of 2 x 2 matrices 
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with complex entries, and the subsets W, C M)(IR) and W2 C M2(C): 


2 2 
mal(S i reah ma{(% B)jasech. 


Show that W; is not a subspace of M(R), but that W3 is a subspace of 
M,(C). 

Find a basis and state the dimension of the subspace Lin(S) in 
M,(IR) and in M>(C), where 


S={(o 2) 0 1) Co 3) Co D) 


Exercise 13.4 Find the eigenvalues of the matrix 


1 i 
4=(.4; 41): 


Express each of the eigenvalues in the form a + ib. 


Exercise 13.5 Diagonalise the matrix 


3 -l 
EF 
Exercise 13.6 Let 


5 5 —5 0 
a=(3 3 =s) and v=(1), 
4 0 -2 1 


(a) Show that v is an eigenvector of A and find the corresponding 
eigenvalue. 
(b) Show that A = 4 + 2i is an eigenvalue of A and find a corresponding 
eigenvector. 
(c) Deduce a third eigenvalue and corresponding eigenvector of A. 
Hence write down an invertible matrix P and a diagonal matrix D such 
that P-'AP = D. 

Is it possible to unitarily diagonalise the matrix A? Do this if it is 
possible, or explain why it is not possible. 


Exercise 13.7 Show that the vectors 


1 127 
v = — |í an v= —— —i 
v2 o 3v2 (2+2; 


form an orthonormal set, S = {v1, V2}. 
Extend S to an orthonormal basis of C?. 
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Exercise 13.8 What is a unitary matrix? What does it mean to say 
that a matrix is unitarily diagonalisable? Prove that if A is unitarily 
diagonalisable, then 4 A* = A* A. 


Exercise 13.9 Prove that if A is a unitary matrix, then all eigenvalues 
of A have a modulus equal to 1. 


Exercise 13.10 Let 


Express A in the form 
A=) Ey + AzgE2 + A3 £3, 


where 41, Az, A3 are the eigenvalues of A, and E1, E2, E3 are symmetric 
idempotent matrices such that ifi # j, then E; E; is the zero matrix. 
Determine a matrix B such that B? = A. 


Exercise 13.11 (a) If A is any m x k complex matrix, prove that the 
matrix A* A is Hermitian and normal. 

(b) Prove that, for any m x k matrix A of rank k, the matrix A*A is 
positive definite, meaning that v*(A*A)v > 0 for all ve C”, v40. 
Prove also that A* A is invertible. 


Exercise 13.12 Explain why you know that the matrix 


1 i 0 
s=; 1 o) 
0 0 1 


can be unitarily diagonalised before finding any eigenvalues and eigen- 
vectors. 

Then find a unitary matrix P and a diagonal matrix D such that 
P*BP = D. 

Write down the spectral decomposition of B. 


13.11 Problems 


Problem 13.1 Consider the complex numbers 
_ W3-i)8 
A (ieee 
Plot z and w as points in the complex plane. Express them in exponential 
form and hence evaluate q. Express q in the form a + ib. 


z=V¥3-i, w=1+4i, 
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Problem 13.2 Write each of the following complex numbers in the 


form a + ib: 
; 30 = 30 ; lia ; _ = 7 PAEA 
e om ea, ei , elti, e 7 e 3+i2V5 | dele. 


Problem 13.3 Find the roots w and w of the equation x? — 4x + 7 = 0. 
For these values of w and w, find the real and imaginary parts of 
the following functions, 


f=—e™, teR; eH =w, tez”. 
Problem 13.4 Let y(t) be the function 


y(t) = Ae“ + Be”, 
where A, B €e C are constants and A = a + ib. Show that y(t) can be 
written in the alternative forms 
y(t) = e” (Ae + Be") = e"(A cos bt + B sinbt), 


where 4 = A+ Band B =i(A — B). How can A and B be chosen so 
that A and B are real? For A and B real, show that y(t) can be written 
as 


y(t) = e” (A cos bt + B sin bt) = Ce“ cos(bt — $), 
where C = y A2 + B2 = 2V/AB and ¢ satisfies tan? = (B/ A). 


Problem 13.5 Show that for any z € C, the expressions e” + e”, 
t € R, and z'+2',t € Z+, are both real. 


Problem 13.6 Find the three complex roots of the equation z? = —1. 
Illustrate the roots as points in the complex plane. 

Find the roots of z+ = —1 and illustrate them on another graph of 
the complex plane. Without actually solving the equations, illustrate the 
roots of x5 = —1 and x° = 64 as points in the complex plane. 


Problem 13.7 Let 
1 -1 
C= G ) 


(a) Find the eigenvalues of the matrix C. 
Diagonalise C by finding complex matrices P (invertible) and D 
(diagonal) such that P~-'CP = D. 

(b) Let y(t), y2(t) be functions of the real variable t which satisfy the 
system of differential equations, y’ = Cy with y = (y1, y2)": 


y(t) = yit) — y(t) 
y(t) = yilt) + y(t) 
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and the initial conditions y,(0) = 0 and y2(0) = 1. 

Set y = Pz and use the diagonalisation from part (a) to find the 
solutions y(t), y2(t) in complex form. 

Simplify your solutions using Euler’s formula to express yı(t) 
and y(t) as real functions of t. 


Problem 13.8 Consider the following subspaces of C?: 


sf) ro) 


Find vectors x, y, z in C? which satisfy all of the following: 


(i) the vector x is in both subspaces U and W; that is, x € UN W; 
(ii) the set {x, y} is an orthonormal basis of U; 
(iii) the set {x, z} is an orthonormal basis of W. 


Is your set {x, y, z} a basis of C?? Is it an orthonormal basis of C3? 
Justify your answers. 


Problem 13.9 Show that the vectors 


1 0 
v= (3] and wo (2-2) 
1 142i 


are orthogonal in C? with the standard inner product. Write down 
an orthonormal basis {u;, u2} for the linear span of these vectors, 
Lin{v, A v2}. 

Extend this to an orthonormal basis B of C3, by finding an appro- 
priate vector uz. (Try this two ways: a Gram-Schmidt process, solving 
a system of linear equations.) 

Express the vector a=(1,1,i)' as a linear combination of 
u, U2, U3. 

Find the matrix of the orthogonal projection of C? onto Lin{u;}. 
Call this matrix E£. Show that E; is both Hermitian and idempotent. 
Find E,a. 

Find the matrices Ey and £3 of the orthogonal projections of C? 
onto Lin{uy} and Lin{u3} respectively. Then express the identity matrix 
I as a linear combination of the matrices £,, E> and £3. 


Problem 13.10 Find an orthonormal basis of C? consisting of eigen- 


vectors of the matrix 
2 1+i 0 
A= (i —i 3 o) ; 
0 0 5 


Is the matrix A Hermitian? Is 4 normal? (Justify your answers.) 
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Problem 13.11 What does it mean to say that a complex matrix A is 
normal? Write down the equations that a, b, c, d must satisfy for the 


2 x 2 real matrix 
(e A a,b,c,d ER, 


to be normal. Show that every 2 x 2 real normal matrix is either sym- 
metric or a multiple of an orthogonal matrix. 


Problem 13.12 Show that the matrix 


TT 


is normal, but that it is not Hermitian. 
Find the eigenvalues and corresponding eigenvectors of A. Find a 
unitary matrix P and a diagonal matrix D such that P*AP = D. 
Write down the spectral decomposition of A. 


Problem 13.13 Consider the following matrix 4, where z is a complex 


number: 
0 i 
a= a 


For which values of z is the matrix A unitary? For which values of z is 
A normal? 


Problem 13.14 Unitarily diagonalise the matrix 


2 i 0 
a=(— 2 0). 
0 0 Si 


Is the matrix A Hermitian? Is 4 normal? (Justify your answers.) 


Problem 13.15 A complex matrix A is called skew-Hermitian if 
A* = — Á. Prove the following three statements about skew-Hermitian 
matrices: 


(1) The non-zero eigenvalues of a skew-Hermitian matrix are all 
purely imaginary. 

(2) If A is skew-Hermitian, then eigenvectors corresponding to dis- 
tinct eigenvalues are mutually orthogonal. 

(3) Skew-Hermitian matrices are normal. 


Problem 13.16 Show that the matrix 


i 1 0 
a= (i 0 -1] 
0 1 i 


is skew-Hermitian. (See Problem 13.15.) 
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Find the spectral decomposition of A and deduce the spectral 
decomposition of A*. 


Problem 13.17 Let A be a normal n x n matrix and let P be a unitary 
matrix such that P* AP = D, where D = diag(A), Az, ..., An) and the 
columns of P are the corresponding eigenvectors, X1, X2,..., X,. Show 
that for a positive integer k 

A = EE, +058, 4---+.4E, = PDI PŽ, 


where E; = xx”. 
Find the spectral decomposition of the following matrix A. (See 


Problem 13.10.) 
2 1+i7 0 
A= (i 3 o) ; 
0 0 5 


Deduce the spectral decomposition of 4°, and use it to find the 
matrix A?. 
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Chapter 1 exercises 


4 
Exercise 1.1 (a) Ad = 3 


(b) AB is (2 x 3)(3 x 3) giving a2 x 3 matrix. C isa3 x 2 matrix, so 
AB + C is not defined. 


@asce=(4 ioth 2 =O $ 4) 
wore=(12 x(a 2)-(4 å) 

10 1 i 1 0 5 
osc=|2 ı Te 2) = (4 s); 

1 1 -l -1 4 5 -l 


10 1 
(f) dB=(2 -1 D: 1 1 }=0 0 0). 
1 


A 


(g) Cd is not defined. 


2 
(h) dd=(2 —1 D()=0 


2 4 -2 2 
(i) w= (i)e —1 p= (2 1 1). 
1 2 =l 1 


Exercise 1.2 The matrix A must be 2 x 2. Let A = @ h ). Then the 


d 
equation 
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holds if and only if 

a+7c=-—4, S5a=15, and 9a+3c=24 
and 

b+7d=14, 5b=0, and 9b+3d=x. 


Solving the first two equations for a and c, we have a = 3 and c = 
—1. This solution also satisfies the third equation, 9a + 3c = 9(3) = 
3(—1) = 24, so the matrix A does exist. Solving the second set of 
equations for b and d, we find b = 0 and d = 2, therefore x = 9b + 
3d = 6. The matrix A is 
3 0 
A= e 3 | 


Exercise 1.3 By definition, to prove that the matrix AB is invertible 
you have to show that there exists a matrix, C, such that 


(AB)C = C(AB) = 1. 


You are given that C = B~!A7!. Since both A and B are invertible 
matrices, you know that both A7! and B7! exist and both are n x n, 
so the matrix product B~!A7! is defined. So all you need to do is to 
show that if you multiply 4B on the left or on the right by the matrix 
B-'A7!, then you will obtain the identity matrix, /. 


(AB)(B-!A~!)= A(BB~'!)A7! (matrix multiplication is associative) 


= AI Aq! (by the definition of B7!) 
= AA! (since AJ = A for any matrix A) 
=I (by the definition of 47 !). 


In the same way, 
(B-'A7!\(AB) = (B'AT! A)(B) = BIB = BB =I. 
Hence B~! 47! is the inverse of the matrix AB. 


Exercise 1.4 Use the method shown on page 19 to find the inverse 
matrix. Then solve for 4 and you should obtain 


1 0 
AS( i hs 
5 2 


Exercise 1.5 Begin by taking the inverse of both sides of this equation 
and then use A~!. A is square, since it is invertible. If A is n x n, then 
B must also ben x n for AB to be defined and invertible. Simplifying 


Comments on exercises 433 


the equation, you can deduce that B = Ji , where / is the n x n identity 
matrix. 


Exercise 1.6 Since BT is a k x m matrix, BTB is k x k. Furthermore, 
(BTB)! = B'(B')! = BTB, which shows that it is symmetric. 


Exercise 1.7 The expression (A'A)~!41(B~!A™)"B™B?B™ simplifies 
to B. Be careful how you do this: you cannot assume the existence of a 
matrix A~!; indeed, if m Æ n, such a matrix cannot exist. 


Exercise 1.8 (a) A vector equation of the line in R? is 


EEA 108 
~w] (I 3)’ i 
(b) A vector equation of the line in R? is 
X1 3 5 
X2 1 —3 
x=/]x3/=]-—-l]4+r]—-l], teR 
X4 2 1 
X5 5 4 


The point (4, 3, 2, 1, 4) is not on this line as there is no value of ¢ for 
which 


4 3 5 
3 1 —3 
2)={]-1]+4+¢t]-!1 
1 2 1 
4 5 4 
For example, in order to have x4 = 1, then t = —1, and none of the 


other components corresponds to this value of t. 


Exercise 1.9 To obtain the vector equation of the line, set 


A ‘ip! 
cs 


and solve for x, y, z. You should easily obtain x = 14 3t, y=t—2 
and z = 5 — 4t, so the equation is 


J-C) 


me 
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Exercise 1.10 The vector equations of the three lines are 


GOG) 


Lo: 


9 2 
L3: x= (3) +q (-0); q€ 
1 —8 


taking (9, 3, 1)! — (7, 13, 9)! = (2, —10, —8)! as the direction vector. 
The lines Lı and L3 are parallel, since their directions vectors 
are scalar multiples, —2(—1, 5, 4)' = (2, —10, —8)". The lines are not 
coincident since the point (9, 3, 1) does not lie on L; (check this). 
The lines L; and L3 either intersect or are skew. They will intersect 
if there are constants s, t € R satisfying the following equations: 


A 


al 

l 
AEN 
| 
we @ 
Ne 

+ 

io) 
[7N 
INA 
—_ 
Ne 

Mn 

M 

A 


a, 


1—t=8+6s 6s +t = —7 
34+ 5t = 2s = 22s—5t=3 
2+4t=-3-s s+4t = —5 


Use two of these equations to solve for s and t, then check the solution 
in the third equation. The three equations have the solution s = —1 and 
t = —1, so the two lines intersect in the point (2, —2, —2). The angle 
of intersection is the acute angle between their direction vectors. Since 


ea 


aiT ; ; 
the angle between them is — and the two lines are perpendicular. 


The lines L3 and L3 are skew. To show this, you need to show that 
there is no solution, q, s € R, to the system of equations 


9 + 2q = 8 + 6s 6s —2q = 1 
3 — 10q = 2s <> 2s+10q4=3 
1 — 8q = -3 -s s — 8q = —4 


Solving the first two equations, you obtain q = 7 and s = i but this 
solution does not satisfy the third equation, so the lines do not intersect 
and are skew. 
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Exercise 1.11 The plane which contains the two parallel lines contains 
the points (1, 3, 2) and (9, 3, 1) and the direction vector of the two lines, 
v = (—1,5,4)!. So it also contains the line through the two points, 
which has direction w = (8, 0, —1)'. Therefore, a vector equation of 
the plane is 


x 1 —|l 8 
x= ()-{3) +s | 5 Je 0 | =p ttm S,te 
Z 2 4 =j 


There are two ways to obtain the Cartesian equation from this. You can 
find a normal vector n = (a, b, c)! by solving the two linear equations 
obtained from (n, v) = 0 and (n, w) = 0, namely 


Aa 


—a+5b+4c=0 and 8a —c = 0. 


There will be infinitely many solutions, and you can just choose one. 

Or, you can write down the three component equations resulting 
from the vector equation and eliminate s and t. This last method is 
more easily accomplished by first finding the Cartesian equation of a 
parallel plane through the origin: 


x —] 8 x = —s + ŝt 
()=<(5 Jer(o | Ba ee 
7 4 =l z=4s-t 


Eliminating s and ¢ from these equations yields 
5x — 31y + 40z = 0, 


and you should check that this is correct by showing that the vec- 
tor n = (5, —31, 40)! is orthogonal to both v and w. Then, since 
5(1) — 31 (3) + 40(2) = —8, a Cartesian equation of the plane is 


5x — 31y + 40z = —8. 


Again, you can check that the point (9, 3, 1) also satisfies this equation. 
You should try both methods. 


Exercise 1.12 The normal to the plane is orthogonal to the direction of 


the line, as 
2 al 
1 2 


Since the point (2,3, 1) on the line does not satisfy the equation of 
the plane, the line is parallel to the plane. Therefore, it makes sense to 
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ask for the distance of the line from the plane. This can be found by 
dropping a perpendicular from the line to the plane and measuring its 
length. A method for doing this is given in the question. 

The line through (2, 3, 1) and parallel to the normal to the plane is 
perpendicular to the plane. A vector equation of the line is 


)-()Q)» 


Equating components, we have x = 2+ 2t, y = 3, and z = 1 + ¢. At 
the point of intersection of the line with the plane, these components 
will satisfy the equation of the plane, so that 


ue 


2x +z=9 => 2242+(14+H)=9 > 545t=9, 


or t = z, Then putting this value for ¢ in the line, we find the point of 


intersection is 
p= = . 
I Pki 


The distance between the line and the plane is the distance between this 
point and the point (2, 3, 1), which is given by the length of the vector 


-00 


44/5 
so the distance is = 


Chapter 2 exercises 
Exercise 2.1 For this solution, we’ll indicate the row operations. 


i =k a 3 ee ely sale e 
(a) (Alb) = (- a a) an (o 1 2 =) 
e=: 2 igh ON: 229" a 40 
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from which it follows that 


This system has a unique solution. The three planes intersect in one 
point. 


Dat, Be 4K 28 ee E 
(b) (Alb) = ( 1 -1 7 jes (2 -1 3 ‘ 
5) 2. Ww F s 2, oly A 


fe 1 -1 1 l1 1 -1 1 
sn fo —3 5 2) = (o —3 5 2) 
Aa tay EE. 0 0 0 0 


ee ee ae ee 10 3 3 
aes (o i 2 i) (o i: ae =), 
00 0 0 00 0 0 


Set z = t and then solve for x and y in terms of t. There are infinitely 
many solutions: 


n 5 _ 2, 5 2 
3 3 3 
Z t 0 1 


The three planes intersect in a line. If you set z = 3s, then this line of 
solutions can be written as 


-GG 


Exercise 2.2 The system of equations in part (a) is the associated 
homogeneous system of part (b), so to solve both you can row reduce 
the augmented matrix and then interpret the solutions of each system. 
Doing this, 


=j a ae 6 D ee 
(Alb) = 3 2 10 -10) = (o l1 1 8 ) 
=, ao 5) T 0 1 1 -3 


A 


a 


from which you can already see that the system is inconsistent 
since y +z = 8 and y +z = —3 is impossible. However, the homo- 
geneous system is always consistent, so continuing the reduction of the 


438 Comments on exercises 


1 —1 3 1 0 4 
A — (o 1 i) — (o 1 ') 
0 0 0 0 0 0 


So the answers are: 


—4 
(a) =(=). t € R, and 


(b) no solutions. 


matrix Á, 


Exercise 2.3 Solve the first two equations simultaneously using Gaus- 
sian elimination. The general solution takes the form x = p + sw, 
s € R, where p = (1, 0, 0)! and w = (0, —1, 1)', which is the equa- 
tion of the line of intersection of the two planes. 

The third plane intersects the first two in the same line. You can 
determine this by solving the linear system of three equations using 
Gaussian elimination. Alternatively, you can notice that the line of 
intersection ofthe first two planes is in the third plane (since its direction 
is perpendicular to the normal, and the point v satisfies the Cartesian 
equation of the plane), so this must be the intersection of all three 
planes. 


Exercise 2.4 To solve the system of equations Ax = b, where 


2 3 1 1 4 
a=(1 2 0 -1), = (1). 
3 4 2 4 9 


using Gaussian elimination, put the augmented matrix into reduced row 
echelon form: 


231 1 4 1 2 0 —i 1 
cay (1 Ios] 1) — (2 3 1 1 ‘ 
342 4 9 342 4 9 


1 2 0 -I 1 1 2 0 Sl 1 
— (o =! 1 2 — (o 1 -1 -3 | 
0 -2 2 6 0 0 0 1 2 
0 


3 
7 
1 2 0 3 1 0 2 0 —5 
= (0i a o4) fo i io ‘). 
1 
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There is one non-leading variable. If x = (x, y,z, w)', then set z = t. 
Reading from the matrix (starting from the bottom row), a general 
solution is 


—5-— 2t =5 =2 
4+t 
t 
2 


as 


+t =prtv, t€ 


enews 
NO #4 


1 
1 
0 


Checking the solution by multiplying the matrices, you should find 
Ap = b and Av = 0. If your solution is not correct, then you will now 
know it, in which case you should go back and correct it. 


2 3 


3 4 


2 3 
AV = [ 
3 4 


The reduced row echelon form of the matrix A consists of the first four 
columns of the reduced row echelon form of the augmented matrix 


above, that is 
10 2 0 
(o e o). 
00 0 1 


(1) No, there is no vector d for which the system is inconsistent. Since 
there is a leading one in every row of the reduced row echelon form of 
A, the system of equations Ax = d is consistent for all d € R°. 

(11) No, there is no vector d for which the system has a unique solution. 
The system of equations Ax = d has infinitely many solutions for all 
d € R? since there will always be a free variable. There is no leading 
one in the third column. 


N N 
Noe N OF 
— — 
Se 8 
= =. of 
II II 
ATTAN AEN 
oo © Om A 
W—-___ eee a 


Exercise 2.5 Reducing the matrix C to reduced row echelon form, 


beginning with 
8 12 -1 3 8 
1 ) — (o 5 5 15 a) / 
—2 02 2 4 6 


Q 

II 
ERR 
| = 

W 
lw 

CEN 
oo | 
= 
= HA Ww 
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you should eventually obtain (after four more steps) 
10-3 0 4 
Se — (o 1 1 0 -=l ) ; 
00 0 1 2 


(a) If C is the augmented matrix of a system of equations Ax = b, 
so that C = (A|b), then there is one non-leading variable in the third 
column, so the solutions are 


x] 4+ 3t 4 3 

— | x2 | —l-t RS —1 —1 
x= ile i Sl +t Le teR 

X4 2 2 0 


These solutions are in R4. 

(b) If C is the coefficient matrix of a homogeneous system of equations, 
Cx = 0, then there are two non-leading variables, one in column three 
and one in column five. Set x3 = s and x5 = t. Then the solutions are 


x1 3s — 4t 3 —4 
X2 —s+t —1 1 
x= [x43] = S =s| 1 +t , s tER 
X4 —2t 0 —2 
X5 t 0 1 


These solutions are in R5. 
(c) To find d, 


1 2 =| 3 8 
Cw = (-: -1 8 6 1 ) 
1 


-1 0 3 —2 


=. =.. Om 


To find all solutions of Cx = d, there is no need to use Gaussian 
elimination again. You know that Cw = d and from part (b) you have the 
solutions of the associated homogeneous system. So using the Principle 
of Linearity, a general solution of Cx = d is given by 


X1 1 3 —4 

X2 0 —1 1 
x=[x3/=]1]4+s5] 1 | +t , s ftER 

X4 1 0 —2 

X5 1 0 1 
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Exercise 2.6 Using row operations on the matrix B, you should obtain 
the reduced row echelon form, 

1 0 0 

0 1 0 

Po ee Coi 

00 0 

Therefore, the only solution of Bx = 0 is the trivial solution, x = 0. So 


the null space is the set containing only the zero vector, {0} c R3. 
0 


4 
—5 
—8 
To find all solutions of Bx = d, you just need to find one particular 
solution. Using the expression of a matrix product 


The vector d = cı + 2c) — ¢3 = 


Bx =x )C) + X22 + x3€3 


(see Theorem 1.38), you can deduce that 


C) 


is one solution. Since the null space consists of only the zero vector, 
this solution is the only solution. 
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Exercise 3.1 We begin with A, by row reducing (A|/). 
1 2 -1 1 0 0 
an=(o 1 2 0 1 o) 
3 8 1 0 0 1 
1 2 -=l 1 0 0 
= (o 1 2 0 1 o). 
02 4 —3 0 1 


The next step, R3 — 2 R2, will yield a row of zeros in the row echelon 
form of A, and therefore the matrix A is not invertible. 
For the matrix B, 


— 21 1 
an=( o 12 0 
3 14 0 

1 -2 -1 -1 00 

s (o 1 2 0 1 0) 


3 1 4 0 0 1 
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a (1 -2 -1 -100 
efon a oao) 


3 1 

0 0 1 —2 1 =l 
1 -2 0 -P 7 = 
e 1 0 sy 2! 
Re—2 83 la T 
0 1 — 1 -! 

1 0 0 Z -1 3 

= ie. = =) 2 

3 1 

0 0 1 -3 1 -i 


This final matrix shows that 


Next, check this is correct by multiplying B B71, 


a2 ie fe ST So 1/7 9 0 
se| o DEE =y; 2 \=t(0 7 o) =z 
z £4 iea T7 at a 


We’ll answer the next part of the question for the matrix B first. To 
solve Bx = b, we can use this inverse matrix. The unique solution is 
x = B`!p, 
3 10 

—1 7 1 7 
pa 7 
7/ \9 - 
Check that this is correct by multiplying Bx to get b. 

Since the matrix B is invertible, by Theorem 3.8 we know that 
Bx = d is consistent for all d € R°. 
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For the matrix A, to solve Ax = b we need to try Gaussian elimi- 
nation. Row reducing the augmented matrix, 


1 2 =l 1 1 
c= (o 1 2 1) — (0 
3 8 1 5 0 


1 —5 -l1 
= (0 2 i). 
0 0 0 0 


This system is consistent with infinitely many solutions. Setting the 
non-leading variable equal to ¢, the solutions are 


-1 +5t -1 5 
r= ( 1-2 ) = (i ef). te 
t 0 1 


Since the matrix A is not invertible, there will be vectors d € R? for 
which the system is inconsistent. Looking at the row reduction above, 
an easy choice is d = (1, 1, 4)", for then 


1 2 ety 1 i oe ii 
a= (o i 2 1) — (01 2 | 
3 8 1 4 02 4 1 
2 
1 


1 —1 1 
= (0 2 i); 
00 0 1 


which shows that the system 4x = d is inconsistent. 


-= © N= N 
AW! 

— 

NRF Re 
a 


aa 


Exercise 3.2 Note the row operations as you reduce A to reduced row 
echelon form, 


10 2 10 2 PE OY 2 
a=(0 1 -1] aoe (o 1 -1] Be (o 1 -1] 
l1 4 -l 0 4 3 00 1 


1 0 2 10 0 
Rork (o 1 o) RiT (o 1 o) =1 
0 0 1 00 1 


There were four row operations required to do this, so express this as 
E4E3E2E1A = I. Then A= (Ei) (E2) (E3) (E4) L where, for 
example, the last row operation, Rı — 2 R3, is given by the elementary 
matrix E; so the inverse elementary matrix E47! performs the operation 
R, +2R3. 
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Then the matrix A is given by 


1 0 0 1 0 0 1 0 0 1 0 2 
A= (o 1 o) (o 1 o) (o 1 -1) (o 1 o). 
1 0 1 04 1 0 0 1 0 0 1 


You should multiply out this product to check that you do obtain A. 
Then using the fact that the determinant of the product is equal to 
the product of the determinants, 


|A] = KED EDI IEI E'I. 


Each of the above elementary matrices represents a row operation R 03, 
adding a multiple of one row to another, which does not change the value 
of the determinant. So for each 7 = 1, 2, 3, 4, E! | = 1. Therefore, 
|A| = 1. 

To check this, we’ll use the cofactor expansion by row 1, 


10 2 
la=10 1 =1|=1(-1+4+20-1)=1. 
1 4 -I 


Exercise 3.3 (a) Expanding the first determinant using row 1, 


5 2 —4 
-3 1 1 |=52-7-—2(—6+ 1)- 4-21 + 1) 
zig 2 


= —25 + 10 + 80 = 65. 


You should check this by expanding by a different row or column. 
(b) The obvious cofactor expansion for this determinant should be using 
column 3. 


=6/1 4 3|=6[2(4-23)-1(5-— 1) = -12 
1 40 3 ka ii 
0 1 0 1 


Exercise 3.4 You will make fewer errors when there is an unknown 
constant present if you expand by a row or column containing the 
unknown. So, for example, expanding by column 1: 


2 1 w 
|B} =)3 4 —1|= w(—10)+ 1(—5)+ 7(5) = —10w + 30. 
1 —2 7 


Therefore, |B| = 0 if and only if w = 3. 
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Exercise 3.5 Ideally, you want to obtain a leading one as the (1, 1) entry 
of the determinant. This can be accomplished by initially doing a row 
operation which will put either a 1 or —1 at the start of one of the rows. 
There are several choices for this; we will do R2 — R3 to replace row 2. 


5 2 -4 -2 5 2 -4 -2 


ee es | ee ee 
—4 3 #1 3| |-4 3 1 3 
2 1 -1 1 2 1 -1 1 
1 -2 4 -2 1 -2 4 —2 
Sa ee ee 
E —4 3 1 BS iI = 0 -5 17 —5 
2 1 -1 1 0 5 -9 5 
03 6 2|__,)2 +8 
G20" 8. D Ea 
0 5 -9 5 
3 2 3 2 
= ai s|=-160; |= —160(1) = —160. 


The question asks you to check this result by evaluating the determinant 
using column operations. Staring at the determinant, you might notice 
that the second and fourth columns have all their lower entries equal, 
so begin by replacing column 2 with C2 — C4, which will not change 
the value of the determinant, and then expand by column 2. 


5 Oe At y) 5 4 4 -2 


agek S TA AE a 
a Ye eA 3 
Z p =l, i oe Oy ee l 
By B.A 

=-4/-4 1 3 

go a 

= —4(40) = —160, 


where the 3 x 3 determinant was evaluated using a cofactor expansion. 


Exercise 3.6 To answer this question, find the determinant of A, 

7-r -15 
2 —4— À 

= 47-3242 =(A—1)(A—2). 


|A| = | = (7 = 2-4-2) +30 


Therefore, AT! does not exist —> |A| =0 —$ A=1 ora =2. 


446 
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Exercise 3.7 Given that A is 3 x 3 and |A| = 7, 


|2A| = 27|A| = 56, 
|A7| = |A| |A| = 49, 
24-4] = 24) = a = e 
|A| 7 
1 1 
A= 
2A| 56 


Exercise 3.8 The first thing you should do for each matrix is evaluate 
its determinant to see if it is invertible (B is the same matrix as in 
Exercise 3.1, so you can compare this method of finding the inverse 
matrix to the method using row operations.) 


-1 2 1 
B= r A, 2 Sot iima, 
3 1 4 
so BT! exists. 
Next find the cofactors, 

1 2 0 2 
Cie); pale Cu=-|; i= -9 
0 1 
ca-l? H=- 

2 1 —-1 1 
o-- Yam cji es 
-1 2 
Ca=-|3° i=- 
2 1 —-1 1 
C31 É a Ca=-|5 ,|=-C 
—] 2 
Ca =| 0 i|=-1. 
Then, 
1 2 -7 3 
=j 
B =3(6 —7 J 
—3 7 -l 
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Check that BB~! = J. For the matrix C, expanding by column 1, 


5 2 -I 
ICj=|1 3 4 | =5(—11) —2(-21) — 1(-13) = 0, 
6 5 3 


so C is not invertible. 


Exercise 3.9 The system of equations is 


—x+2y+z=1 
y+2z=1 
38x +y+4z=5. 


This has as its coefficient matrix the matrix B of Exercise 3.1. We 
know from the previous exercise that |B| = 7. Since the determinant is 
non-zero, we can use Cramer’s rule. Then 


1 {2 2!) a@-a-6 41-4) 10 
x= aI! 1 2\|= = ; 
Ea [B] 7 
ti- > 2) gate: 9 
y=—/0 1 2|= =, 
Bil3 5 4 BI 7 
i = +) opia- 4 
222216 1 Te ee 
Bil3 1s B| 7 


which, of course, agrees with the result in Exercise 3.1. 


Exercise 3.10 (a) Electricity is industry i2, so column 2 gives the 
amounts of each industry needed to produce $1 of electicity: 


C12 <> $0.30 water, c22 <> $0.10 electricity, c32 <> 0 gas. 


(b) To solve (J — C)x = d, you can either use Gaussian elimination 
or the inverse matrix or Cramer’s rule. We will find the inverse matrix 
using the cofactors method. 


08 —0.3 —0.2 
(I-C) = (04 0.9 -02 
0 0 0.9 
1 (0-81 0.27 0.24 
= > (I—C)' = — | 0.36 0.72 0.24]. 


054\ 9 0 06 
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Then the solution is given by 
xy 1 0.81 0.27 0.24 40, 000 
x= | x | = — | 0.36 0.72 0.24 100, 000 
| al 0 0 2 oy, 
142, 000 
= (192 00 : 
80, 000 


The weekly production should be $142,000 water, $192,000 electricity 
and $80,000 gas. 


Exercise 3.11 (a) The cross product u x v is 


e @& © 23 
w=uxve=|l 2 3 = 242 -95= (2), 
2 —5 4 —9 


This vector is perpendicular to both u and v since 


Oe 
(ORC 


(b) You are being asked to show that the inner product of a vector 
a € R? with the cross product of two vectors b, e € R? is given by the 
determinant with these three vectors as its rows. To show this, start with 
an expression for b x ¢ 


€] Cn © 
by b bi b bi b 
bxc= bı bo b3 = 2 3 al l 2 1 2 e3 
C2 C3 Ci C3 cy C2 
Ci C2 C3 
and then take the inner product with a: 
by b bi b bi b 
(a,b x ¢) = 2 03 — {1 3 ae 
C2 63 Cy C3 Ci C2 
d aà a3 
=|b,; b b3). 
C1 CI- C3 


This shows that the inner product is equal to the given 3 x 3 determinant 
(as it is equal to its expansion by row 1). 

To show that b x ¢ is orthogonal to both b and c, we just calculate 
the inner products (b, b x c) and (c, b x c) using the above determinant 
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expression and show that each is equal to 0: 


bi by by 
(b,bxec)=|b; by b3}=0 and 
Cy C2 C3 
Ci C2 C3 
(c, b xe) = bi by b3 =0, 
Ci C2 C3 


since each of these determinants has two equal rows. Hence the vector 
b x c is perpendicular to both b and c. 

(c) Therefore, the vector b x e = n is orthogonal to all linear com- 
binations of the vectors b and c and so to the plane determined by these 
vectors; that is, b x ¢ is a normal vector to the plane containing the 
vectors b and c. 

If (a, b x c) = (a, n) = 0, then a must be in this plane; the three 
vectors are coplanar. 

All these statements are reversible. 

The given vectors are coplanar when 


3 -1 2 
t 5o Lr=0, 
-2 3- 1l 


so if and only if t = —4. 
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Exercise 4.1 Write down the augmented matrix and put it into reduced 
row echelon form, 


15 3T i ypa 5 3 7-2 
am= (2 10 3 8 5 -5 | RRi QE -s 3 =) 
1 5133 ~> \0 0 -2 —4 2 -6 
or TET E a Sar 1) 
foo i 2 a semo oaa 13 
0 0 -2 -4 2 —6 0000 
1501 4 -7 
efo oiz i |. 
0000 0 0 


Set the non-leading variables to arbitrary constants, say, x2 = $, x4 = t, 
and x5 = u, and solve for the leading variables in terms of these 
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parameters. Starting with the bottom row, write down the vector 
solution: 


x1 —7 — 5s — t — 4u 

X2 S 
xs |x |= 3—2t+u 

X4 t 

X5 u 
—7 —5 —1 —4 
0 1 0 0 

x=| 3 [+s] 0 | +f] —2]4+u] 1 

0 0 1 0 
0 0 0 1 


=p+sv, + tv2 + Uuy3. 


The rank of A is 2 since there are two leading ones. The matrix A has 
five columns, so there are n — r = 5 — 2 = 3 vectors v;. The solution 
is in the form required by the question. 

Verify the solution as asked by performing the matrix multiplication. 
Do actually carry this out. We will show the first two. 


=] 
1 5 3 7 1 0 —7+9 2 
“=|? 10 3 8 s) 3 = (2149) = (5); 
1 5 13 3 0 —7+3 —4 
0 
=3 
P cS g Ca | 1 —5 +3 0 
an=(2 10 3 8 s) 0 = (-i0+ 10) = (0). 
te" ol soy 3 0 -5+5 0 
0 


You can use any solution x (so any values of s, t, u € R) to write b as a 
linear combination of the columns of 4, so this can be done in infinitely 
many ways. In particular, taking x = p, and letting c; indicate column 
i of the coefficient matrix A, 


Ap = —7e; + 3¢3. 


You should write this out in detail and check that the sum of the vectors 
does add to the vector b. Notice that this combination uses only the 
columns corresponding to the leading variables: 


-È 
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Similarly, since Av; = 0, Av2 = Oand Av; = 0, any linear combination 
of these three vectors will give a vector w = sv, + tv2 + uv3 for which 
Aw = 0, and you can rewrite Aw as a linear combination of the columns 
of A. For example, taking vı (sos = 1, t = u = 0), 


1 5 0 
—5¢) +e, = —5 (2) + (10) = (o) ; 
1 5 0 


Exercise 4.2 Using row operations, the matrix A reduces to echelon 
form 


1 0 1 0 2 
Pe ENTE 0 1 -1 1 -I 
00 1 -1i 3 
0 0 0 0 0 


There are three nonzero rows (three leading ones), so the rank of A 
is 3. 

To find N(4), we need to solve Ax = 0, which is a system of four 
equations in five unknowns. Call them x1, x2, x3, x4, x5. Continuing to 
reduced echelon form, 


100 1 = 
Paaa Gaal Oe e 22 
0 0 1 -1 3 
000 0 0 


The leading variables are x1, x2, and x3. Set the non-leading variables 
x4 = s and x5 = t. Then the solution is 


xy —s+t —1 1 

X2 —2t 0 —2 
x3)/=]s-3t |=sļ| 1 +t} 3], s,teR 
X4 1 0 

X5 0 1 


So the null space consists of all vectors of the form x = sv; + tv2, 
where v; and v2 are the vectors displayed above. It is a subset 
of R5. 

The range of A can be described as the set of all linear combinations 
of the columns of A, 


R(A) = {aye, + a2e2 + a3€3 + &4¢4 tases | œ; E R, i =1,..., 5}, 
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where 
1 0 1 0 2 
Cc = 2 CQ = l C3 = : c4 = : C5 = 2 
E 3? -1 Dos 2 
0 3 —2 2 0 


This is a subset of R*. We will find a better way to describe this set 
when we look at the column space of a matrix in the next two chapters. 


Exercise 4.3 |A| = 3A — 9. 
(a) If |A| Æ 0, that is if à Æ 3, then the system will have a unique 
solution. In this case, using Cramer’s rule, z = (3 — 3yz)/(A — 3). 

To answer (b) and (c), reduce the augmented matrix to echelon form 
with à = 3 


1 2 02 1 2 0 2 
am= (s 1 3 | ay (o -9 3 -3 ) 
1 -1 1 u 0 -3 1 u-2 


1 2 0 2 1 2 0O 2 
—> (o 3 j=l 1 —> (o 3 1 1 | 
0 -3 1 pu-2 00 0 u-i 


So if à = 3, this system will be inconsistent if u Æ 1, which answers 
part (b). 

If A = 3 and u = 1, we have (c) infinitely many solutions. Setting 
u = | and continuing to reduced echelon form, 


1 2 0 2 10 2 
yo (0 1 -i ) — (o l1 — ') 
00 0 0 00 0 0 


The solution can now be read from the matrix. Setting the non-leading 
variable z equal to t, 


2 4 
sae ei ee 4 
Z t 0 1 
Exercise 4.4 The matrix B must be 3 x 4 since the solutions are in R* 
and cı € R°. Let 


Imule 


a 


Japan te 
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The solution is of the form x = p + sv, + tv2, where vj, v2 are in N(B); 
therefore, you know that Bp = d, Bv; = 0 and By, = 0. Regarding the 
matrix products as linear combinations of the column vectors of B, we 
obtain 
Bp = cı +2c3 =d By; = —3c +@ = 0 
Bv =e); — & + c4 = 0. 


Knowing c1, you just need to solve these for the other three columns: 


»-(2)-0)-()=-()) 
-()-() 
sea ()- -C 


The matrix B is 
1 3 1 0 
B= ( 3 2 1 i 
2 6 —2 —4 


You can check your answer by row reducing the augmented matrix 
(B |d) to obtain the solution of Bx = d, and matching it to the solution 
given. 


Exercise 4.5 You might have noticed that this is the same coefficient 
matrix A as we encountered in Example 4.7 on page 135. You can easily 
tackle this question by forming the augmented matrix and reducing it 
using row operations: 


1214 l1 2 1 a 
am = (2 3 0 b) > (0 aL <2 6. 
3 5 1e 0 -1 -2 c—3a 


After this first step, it is clear that the system will be consistent if and 
only if 


b—2a=c—3a, or a+b-—c=0Q. 


Hence, the vector y = (x, y, z)! isin R(4)ifand only ifx + y — z = 0. 
This is the Cartesian equation of a plane in R°. 
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The vector d = (1,5, 6)! is in R(A), since its components satisfy 
the equation. This follows from Example 4.7, because the system there 
was seen to be consistent. The vector (1, 5, 4)", for which the system is 
inconsistent, is not in the plane R(A). (Look at Example 4.9 to see that 
the augmented matrix in this case has an echelon form which indicates 
that the system is inconsistent.) 

For Activity 4.8 on page 135, you found a general solution of the 
system of equations Ax = d to be 


7 3 
<= (<3) +:(-2), te 
0 1 


Any solution x will enable you to write d as a linear combination of 
the columns of A. For example, taking first t = 0 and then ¢ = —1, 
d = 7c; — 3c) or d = 4c; — cp — ¢3; that is, 


ee * ie 


Exercise 4.6 You need to put the matrix into row echelon form to 
answer the first question, and into reduced row echelon form for the 
second, 


fa 


1 1 10 3 

P. n DEN [0 1 = 
2 =i 8 00 0 
3 1 7 00 0 


The rank of A is 2. There is one non-leading variable. If you write 
x = (x, y, z)!, then setting z = t, you will obtain the solution 


pa 
II 
~ 
ATTN 
| 
PN G5 
Ne 
~ 
M 
wa 


Since there are non-trivial solutions of Ax = 0, itis possible to express 0 
as a linear combination of the columns of A with non-zero coefficients. 
A non-trivial linear combination of the column vectors which is equal 
to the zero vector is given by any non-zero vector in the null space. For 
example, using t = 1, the product Ax yields 


1 1 1 0 

0 1 —2 0 
—3e,; + 2c, +¢3 = —3 2 +2 1 + g |=|o —0 

3 1 7 0 
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The vector b is in R(A) if b is a linear combination of the column 
vectors of A, which is exactly when Ax = b is consistent. Notice that 
the matrix A has rank 2, so the augmented matrix must also have 
rank 2. Reducing (A|b) using row operations, 


l l1 1 4 l l1 1 4 
0 1 -2 1 0 1 -2 l 
2-1 8 al |0 -3 6 a-8 
3 1 7 b 0 -2 4 b-12 
l 1 1 4 
[01 -2 1 
00 0 a-5 


0 0 0 b-10 


Therefore, Ax = b is consistent if and only ifa = 5 and b = 10. In that 
case, continuing to reduced echelon form, 


1 1 1 4 1 0 3 3 
0 1 —2 1 = 0 1 —2 1 
00 0 0 00 0 0 
00 0 0 00 0 0 
Therefore, a general solution is 
3 —3 
= (i)e) re 
0 1 
(b) Using row operations, 
—2 3 -2 5 1 —2 3 -2 
3 -6 9 -6 —2 3 -2 5 
“Sia a =f a a & a1 % 
5 -6 9 —4 5 -6 9 —4 
1 —2 3 -2 
0 1 4 1 
Do 5s 5 5 
0 4 -6 6 
1 —2 3 -2 1 —2 3 -2 
0 1 1 1 0 1 1 1 
= (30) 0 -1 4 l = (30) 0 0 5 2 = 450. 
0 2 -3 3 0 0 -5 1 


Since det(B) Æ 0, the rank of B is 4. Therefore, the main theorem 
(Theorem 4.1.2) tells us that Bx = 0 has only the trivial solution. 
Therefore, there is no way to write 0 as a linear combination of the 


456 Comments on exercises 


column vectors of B except the trivial way, with all coefficients equal 
to 0. 

Also, using this theorem, Bx = b has a unique solution for all 
b € R4. Therefore, R(B) = R4. That is, a and b can be any real num- 
bers, the system Bx = b is always consistent. 


Chapter 5 exercises 


Exercise 5.1 The set $4 is a subspace. We have 


x x x 1 
(C) eS; = z = y = 3x i ()=(2] =x REL 
Z Z 3x 3 


So, the set Sı is the linear span of the vector v = (1,3,3)! and is 
therefore a subspace of R?. (This is the line through the origin in the 
direction of the vector v = (1, 3, 3)'.) 

The set Sy is a subspace. Since 


sE) (C) 


it is a plane through the origin in R*, and you have shown that a plane 
through the origin is a subspace (see Activity 5.38). You can also show 
directly that the set is non-empty and closed under addition and scalar 
multiplication. 

The set $3 is not a subspace. 0 € $3, but $3 is not closed under 
addition. For example, 


1 1 1 1 2 
(i) es (3) es but (J +(3)=(4Jes 
3 1 3 1 4 


since it does not satisfy the condition zy = 3x. 
The set S4 is not a subspace because it is not closed under addition. 
For example, 


1 0 1 0 1 
(i) es (o) es but OROROL 
0 1 0 1 1 


What is $4? For a vector x to be in S4, either x = 0, y = 0 or z = Q. 
So this set consists of the xy-plane (if z = 0), the xz-plane, and the 


g= 


sty arnoh 
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yz-plane. But any vector in R? which is not on one of these planes is 
not in Sq. 


Exercise 5.2 If A is an n x n matrix, then all vectors x for which Ax is 
defined must be n x 1 vectors, so the set 


S = {x | Ax = dXx}, some à € R, 


is a subset of R”. To show it is a subspace, you have to show it is 
non-empty and closed under addition and scalar multiplication. 

Since 40 = 40 = 0, the vector 0 € S, so S is non-empty. (In fact, 
depending on A, S may well be the vector space which contains only 
the zero vector; more on this is found in Chapter 8.) 

Letu, v € Sanda € R. Then you know that Au = Auand Av = Av. 
Therefore, 


A(u + v) = Au + Av = àu + àv = à(u + v) 
and 


A(au) = a( Au) = a(àu) = A(au) 


sou +v € Sandau € S. Therefore, S is a subspace of R”. 


Exercise 5.3 We are given the vectors 


—1 1 —1 1 
w= (0). w= (2), a w=(2), w= (2). 
1 3 5 5 


(a) The vector u can be expressed as a linear combination of vı and v2 
if you can find constants s, t such that u = sv; + tv2. Now, 


—] if 1 -l=-s+t 
(2 J=s(0)+-(2] = 2= 2 
5 l 3 5=s+2t. 
From the middle component equation, we find ¢ = 1, and substituting 
this into the top equation yields s = 2. Substituting these values for s 


and ¢ in the bottom component equation gives 5 = 2 + 3(1), which is 
correct, so u = 2v; + v2. You can check this using the vectors, 


(0-0) 
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Attempting this for the vector w, 


1 —] 1 l = -s +t 
(2) i o | (2) sp fron 
5 l 3 5=s+2t. 
This time the top two component equations yield £ = 1 and s = 0, and 
these values do not satisfy the bottom equation, 5 = s + 3t, so no solu- 
tion exists. The vector w cannot be expressed as a linear combination 
of vı and v2. 

(b) Since u = 2v; + vo, u € Lin{v1, v2}. Therefore, Lin{v:, v2, u} 
and Lin{v;, v2} are the same subspace. Any vector x = avı + bv2 + cu 
can be expressed as a linear combination of just vı and v2 by just 
substituting 2vı + v2 for u. Therefore, this is the linear span of two 
non-parallel vectors in R3, so it is a plane in R°. 

Since w ¢ Lin{v1, v2}, the subspace Lin{v;, v2, w} must be bigger 
than just the plane, so it must be all of R°. To show that Lin{v,, v2, w} = 
R3, you can establish that any b € R? can be expressed as a linear 
combination, b = av; + bv2 + cw, or equivalently that the system of 
equations Ax = b has a solution where A is the matrix whose columns 
are the vectors v1, V2 and w. You can show this by reducing A to row 
echelon form, or by finding the determinant. Since 


at a 
aslo oS alemin, 
1 3 5 


you know from the main theorem (Theorem 4.5) that Ax = b has a 
unique solution for all b € R°. 

(c) You know from part (b) that {v1, v2, w} spans R?, and therefore 
so does {v1, V2, u, w}. But more efficiently, you can take the same 
approach as in part (b) to show that {v1, v2, u, w} spans IR’, and at the 
same time show that any vector b € R? can be expressed as a linear 
combination of vı, V2, u, w in infinitely many ways. If B is the matrix 
with these four vectors as its columns, then the solutions, x, of Bx = b 
will determine the possible linear combinations of the vectors. We put 
the coefficient matrix B into row echelon form (steps not shown), 


-1 1 -1 1 1 -1 1 -1 
s=% 2 2 2) (0 1 1 i). 
1 3 5 5 0 0 0 1 
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Since there is a leading one in every row, the system Bx = b is 
always consistent, so every vector b € R? can be expressed as a linear 
combination of vı, v2, u, w. Since there is a free variable (in column 
three), there are infinitely many solutions to Bx = b. 


Exercise 5.4 For v, w € R”, the set 4 = {v, w} contains precisely the 
two vectors, v and w. The set B = Lin{v, w} contains infinitely many 
vectors, namely all possible linear combinations of v and w. 


Exercise 5.5 Clearly, P, 4 Ø. We need to show that for any a € R and 
f 2 € Paraf € Pa and f +g € Pa. Suppose that 


f(x) = ag +ax + aox? +- + anx", 
g(x) = bo + bx + bax? be se b,x”. 


Then 


(f + g(x) = f(x) + g(x) 
= (ao + bo) + (ay + bi)x +--+ + (a, +, )x”, 


so f + g is also a polynomial of degree at most n and therefore f + g € 
Pn. Similarly, 


(af (x) = af (x) = (wao) + (ora )x + +++ + (an )x", 


so af € P, also. It can be seen that the set of functions 
{1,x,x?,...,x"} spans P,,, where 1 denotes the function that is iden- 
tically equal to 1 (that is, the function f with f(x) = 1 for all x). 

(Note that the requirement that the polynomials are of degree at 
most n is important. If you consider the set of polynomials of degree 
exactly n, then this set is not closed under addition. For example, if 
f(x) =14+---+3x" and g(x) =2+4+---—3x”, then (f+ 2)(x) = 
f(x) + g(x) is a polynomial of degree at most n — 1. 

However, if n = 1, then the set of constant functions is a subspace. 
0 € U and U is closed under addition and scalar multiplication.) 


Exercise 5.6 If U and W are subspaces of a vector space V, then the 
set 


UNW={x | xeUandxe W} 


is a subspace. It is non-empty because 0 € U and0 € W,so0 EU NW. 
Let x,y € U NW, æ € R. Then since U is a subspace and x,y € U, 
both x + y and ax are in U, and the same is true with regard to W. 
Therefore, both x + y and ax are in U N W, so this is a subspace. 
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Now look at the set 
UUW = {x | xe Uorxe W}. 


If U C W, then U U W is equal to W, which is a subspace. If W C U, 
then U U W = U, and this is a subspace. 

Now suppose that U Z Wand W Z U.Thenthere isa vectorx € U 
which is not in W, anda vector y € W with y ¢ U. Both of these vectors 
are in U U W, but the vector x + y = zis not. Why? You need to show 
that z is not in U and not in W. If z € U, then z — x € U since U isa 
subspace. But z — x = y, which contradicts the assumption that y ¢ U. 
A similar argument shows that z ¢ W, soz ¢ U UW. Therefore, the 
set U U W is not closed under addition and is not a subspace. 

An example of this is to consider the xz-plane (vectors (x, y, z)! 
with y = 0) and the yz-plane (vectors with x = 0) in R3. Each of these 
sets is a plane through the origin, so each is a subspace of R°. Their 
intersection is the z axis, which is a line through the origin, and therefore 
a subspace of R?. Their set theoretic union is just the set of all vectors 
which are on either plane. For example, the vector u = (1, 0, 0) is on 
the xz-plane and the vector v = (0, 1, 0)' is on the yz-plane, but their 
sum u + v = (1, 1, 0)! is not on either plane. 


Chapter 6 exercises 


Exercise 6.1 To show that the vectors x, X2, x3 are linearly independent, 
you can show that the matrix A = (x; X2 x3) has rank 3 using row 
operations, or by showing |A| 4 0. Either method will show that the 
only solution of Ax = 0 is the trivial solution. 

However, since the question also asks you to express the vector v as 
a linear combination of the first three, you will need to solve Ax = v. So 
you can answer the entire question by reducing the augmented matrix, 
(A|v), to reduced echelon form. Then the first three columns will be 
the reduced row echelon form of A, which will be the identity matrix, 
showing that the vectors are linearly independent, and you should obtain 
the unique solution x = (2, —1,3)'. So 


G erate, 
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Exercise 6.2 There are many ways of solving this problem, and there 
are infinitely many possible such vectors x3. We can solve it by guessing 
an appropriate vector x3 and then showing that the matrix (x; x2 x3) has 
rank 3. 

A better approach is to answer the second part of the question first, 
determine what vectors v form a linearly dependent set with x; and x2 
(find the condition on a, b, c as asked) and then write down any vector 
whose components do not satisfy the condition. 

We write the three vectors as the columns of a matrix and row 
reduce it. We take the matrix to be 


12a 
A=1|{1 3 b 
2 5 c 


Notice that you can choose to order x; and x2 so that the row reduction 
will be easier since it makes no difference in this question. The vectors 
{x1, X2, v} will be linearly dependent if the row echelon form of A has 
a row of zeros. 
1 2 a Ro—R, 1 2 a 
A=|1 3 b|roær |0 1 b-a 
2 5 c) = \0 1 c—2a 


TE 1 2 a 
sae 60. il b-a 
0 0 c—a-—b 


So the vectors will be linearly dependent if and only if the components 
of v satisfy 


a+b—c=0. 


So, choose any vector for x3 which does not satisfy this equation, such 
as x3 = (1,0,0)!. 

Note that this condition is the equation of a plane in R? determined 
by the vectors x; and x2. The set {x1, x2, v} is linearly dependent if and 
only if v is the position vector of a point in this plane. 


Exercise 6.3 Suppose that S is a linearly independent set of vectors. 
Then the only linear combination of vectors in S that can equal the zero 
vector is the trivial linear combination (in which all the coefficients 
are 0). Now, suppose R = {X1, Xo,...,x;} is some subset of S and 
suppose that 


aX; + Q2X. +--+ a,x, = 0. 
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The x; fori =1,2,...,r are some of the vectors in S. So S will 
contain these vectors and some others (let’s say k others); that is, for 
some vectors X,41,...,X;44, S is the set {x,,...,X-, X41, +--+, Xir} 


So we can in fact consider the left-hand side of the equation to be a 
linear combination of all the vectors in S and we have 


1X1 + 2X2 +--+ + a,x, + OX +--+ + 0x4; = 0. 


By linear independence of S, it follows that all the coefficients are 0 
and, in particular, 


Qj =a =- =Q, =0. 
It follows that R is a linearly independent set of vectors. 
Exercise 6.4 This question is testing your understanding of some of the 
theory you have seen in this chapter. The vector equation 
QV + 42V2 ++++ + 4nVn = 0 


is equivalent to the matrix equation Ax = 0 with x = (a1, a@,..., an)! 
since Ax = a1Vı + a2V2 + -< + anVn (Theorem 1.38). Therefore, 
a1Vı + a2V2 +--+ + anVn = 0 has only the trivial solution if and only 
if Ax = 0 has only the trivial solution. But we know that Ax = 0 has 
only the trivial solution if and only if |A| 4 0 (by Theorem 4.5). 


Exercise 6.5 Let 


10 4 9 
2 of -ll 2 
A=1) 3 5 1 |? 


2 4 T =3 


the matrix with columns equal to the given vectors. If we only needed 
to show that the vectors were linearly dependent, it would suffice to 
show, using row operations, that rank(4) < 4. But we’re asked for 
more: we have to find an explicit non-trivial linear combination that 
equals the zero vector. So we need to find a non-trivial solution of 
Ax = 0. To do this, put the matrix A into reduced row echelon form, 
and then write down the general solution of Ax = 0. (You should use 
row operations to find this. The details are omitted here.) One solution 
is x = (5, —3, 1, =—{)'. This means that 


1 0 4 9 0 
2 zj zji 2 0 
age | gore a | ge PRG 
2 4 | = 0 
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Exercise 6.6 You are being asked to show this directly, using what you 
have learned in this chapter. Let {v,, v2,...,V,} C R” be any set of 
vectors in R” and let A be the m x n matrix whose columns are these 
vectors. Then the reduced row echelon form of A will have at most 
m leading ones, so it will have at least n — m > 1 columns without a 
leading one. Therefore, the system of equations Ax = 0 will have non- 
trivial solutions, and the column vectors of A will be linearly dependent. 


Exercise 6.7 To prove that {v,, V2} is linearly independent, assume that 
œı and a are scalars such that 


QV, + &2V2 = 0. (x). 


Then A(æı vı + @2Vv2) = 0 
a, AV; + a2Av. = 0 

a (2v1) + &2(5v2) = 0 

201V1 + 5a2V2 = 0. 


Add this last equation to —2 times equation (*) to obtain 3œ2v2 = 0. 
Since v? Æ 0, we must have œ = 0. Substituting back into either equa- 
tion gives œıvı = 0, so that a} = 0 since vı Æ 0. This shows that 
V1, V2 are linearly independent. 


Generalisation 1: The same proof works for any constants, Av; = Kv, 
AV2 = àv provided k Æ x. 


Generalisation 2: It also extends to three (or more) non-zero vectors: 
say, AV; = KV, AV2 = AV2, Av3 = V3 with k, À, u distinct constants 
(that is, no two are equal). 


Exercise 6.8 Observe that each set of vectors contains at least two 
linearly independent vectors since no vector in either set is a scalar 
multiple of another vector in the set. Write the vectors of each set as 
the columns of a matrix: 


-1 1 -l -1 1 1 
B=| 0 2 2 j; Asahi 22 
1 3 5 S 23 


|A| 4 0, so W is a basis of R? and Lin(W) = R°. (Therefore, another 
basis of Lin(W) is the standard basis, {e;, e2, e3}.) 

|B| = 0, so the set U is linearly dependent and one of the vectors 
is a linear combination of the other two. Since any two vectors of U are 
linearly independent, we know that we will need two vectors for a basis 
and Lin(U) is a two-dimensional subspace of R*, which is a plane. So 
we can take the first two vectors in U to be a basis of Lin(U). 
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There are two ways you can find the Cartesian equation of the plane. 
A vector equation is given by 


x —1 1 
x= (>) = 0 J2). S,tE 
Z 1 3 


and you can find the Cartesian equation by equating components to 
obtain three equations in the two unknowns s and t. Eliminating s and t 
between the three equations, you will obtain a single equation relating 
x, y, and z. Explicitly, we have x = —s +t, y=2t, z = s + 3t, so 


aA 


3 
f= 5: Se ene andso z=s+3t= (= -x) tr 
Therefore, x — 2y + z = 0 is a Cartesian equation of the plane. 

Alternatively, you could write the two basis vectors and the vector 
x as the columns of a matrix M and, using the fact that |M| = 0 if and 


only if the columns of M are linearly dependent, you have the equation 


=] 


1 
2 = —2x +4y —2z=0. 
3 


1 


Exercise 6.9 The xz-plane is the set of all vectors of the form (x, 0, z)', 
so the set of vectors {e;, e3} is a basis. 


Exercise 6.10 The first thing you should do is write the vectors in B as 
the columns of a matrix, call it P, and evaluate the determinant. Since 
|P| = —2 £0, the vectors form a basis of R*. Since you need to find 
the coordinates of two different vectors, you need to solve two systems 
of equations, namely Px = w and Px = e,, to find the coefficients in 
the basis B. One efficient method is to find P7! and use this to solve 
the equations as x = P~'wandx = P~!e,. 


L =A 3 Tar ae E 
If p=(1 0 | then Pism 1 2). 


0 3 1 3 —3 4 
You should find that 
—3 —15 
[w]e = | 1 | and [ei]; = -5 | —1 | ; 
2 3 


and check your result. 
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Exercise 6.11 Let 


U = UV, H U2V2 +++ + UnYn and W= WV, + W2V2 + +++ + Wan, 


so that 
uy w] 
u2 W2 
[u]; = x wes]. 
Un B Wn B 
Then 


aut Bw = a(uyv, +U2V2 +--+ + UnVn) 
+ (wV + WoV2 + +++ + Wyn) 
= (au, + Bw))v, + (uz + Bw2)v2 +--+ + (Uy + BWr)Vn. 


Then, 
au, + Bw, uy w1 
CES AE OUAP RI zale 1B w2 
QUn Bg B i B Wn lpg 


= a[u]z + Blw]s. 


Exercise 6.12 We will give a detailed answer to this question. To begin, 
put the matrix A into reduced row echelon form. 


1 2 -l 3 1 0 3 -7 
a=(2 3 0 i) (1 1 —2 T 
z4 5 --2° 3 0 0 0 0 


A basis of row space consists of the non-zero rows of the reduced row 
echelon form (written as vectors), so 


1 0 

a basis of RS(A) is 7 l 
3 f| -2 

—7 5 


A basis of the column space, C S(4) is given by the vectors of the matrix 
A which correspond to the columns with the leading ones in the reduced 
row echelon form of A, so these are the first two columns. Therefore, 


1 2 
a basis of CS(4A) is ( 2 ) ; | 3 )} F 
—4 —5 
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Since the CS(4) has a basis consisting of two vectors, it is a two- 
dimensional subspace of R?, which is a plane. To find a Cartesian 
equation of this plane, you can use 


1 2: X 
2 3 y|=0. 
=4 -5 z 


(Why does this give you the equation of the plane? Because if the 
vector x = (x, y, z)" is in the plane, then it is a linear combination of 
the basis vectors, so the three vectors are linearly dependent and the 
matrix whose columns are these three vectors must have determinant 
equal to 0.) Expanding the determinant by the last column is easiest, 
and you should obtain the equation 


2x — 3y -z =Q. 


(Note that this must be an equation: don’t leave off the ‘= 0’ part. This 
is the equation of a plane through the origin, which is a subspace of R?.) 
The next thing you should do is check that your solution is correct. The 
components of all the column vectors of A should satisfy this equation. 
For example, 2(1) — 3(2) — (—4) = 0 and 2(2) — 3(3) — (—5) = 0. The 
equation is also satisfied by the last two columns, (—1, 0, —2)' and 
(3, 1, 3)' as you can easily check. 

You are asked to state the rank—nullity theorem for matrices, ensur- 
ing that you define each term and use it to determine the dimension of 
the null space, N(4). The theorem can be stated either as 


rank(A) + nullity(A) = n 
or 
dim( R(A)) + dim(N(A)) =n, 


where n is the number of columns in the matrix A. If you used the terms 
rank and nullity, then you must say what these terms mean: rank( 4) = 
dim(R(A)) and nullity(A) = dim(N(A)). Since dim(CS(A)) = 2 and 
n = 4, this theorem tells us that dim( N(A)) = 2. 

You are now asked for what real values of a the vector 


-1 
b(a) = | a ) 
g 


is in the range of A, R(A). The range of A is equal to the column space 
of A, and you already know that this subspace of R? is a plane with 
Cartesian equation 


2x —3y-—z=0. 
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The vector (—1, a, a°)" € R(A) if and only if its components satisfy 
this equation. Substituting, you obtain a quadratic equation in a, 


2(-1)—3(a)—(a’2)=0 or a?+3a+2=0, 


which factors: a? + 3a +2 = (a+2)(a+ 1) = 0. Therefore, the only 
solutions are a = —1 anda = —2. The corresponding vectors are 


= = 
b(-1) = (= and  b(—2) = (-2) . 
1 4 


You might notice that the second vector listed above is equal to —1 
times the first column of the matrix A. 

There are other ways to obtain this result, but they take longer. For 
example, you could write 


A 


giving three equations, one for each component of the vector equation, 
and eliminate s and ¢ to obtain the same quadratic equation in a. 


Exercise 6.13 You have already shown that ATA is symmetric as an 
exercise in Chapter | (Exercise 1.6). To show it is invertible, we will 
show that (AT A)v = 0 has only the trivial solution, v = 0, which implies 
that ATA is invertible by Theorem 4.5. To do this, we first need to show 
that (A'A)v = 0 implies that Av = 0. This is the difficult part. Then 
we can deduce from Av = 0 that v = 0, since A has rank k. 

We will give two arguments to show that Av = 0. The first is a bit 
tricky. We multiply A’ Av = 0 on the left by v! to get v' A’ Av = 0. 
Now, v! A! Av = (Av)!(Av) = 0. But for any vector w € R”, w'w = 
(w, w) = ||w]l?, so we have || Av||? = 0, which implies that Av = 0. 

Alternatively, we can show Av = 0 by asking what ATAv = 
A'(Av) = 0 implies about the vector Av; that is, in what two subspaces 
associated with the matrix A! is it? Since A'(Av) = 0, Av € N(A?). 
Also, the vector Av is a linear combination of the columns of A, hence 
it is a linear combination of the rows of AT, so Av € RS(A'). But 
we have seen that the only vector which is in both the null space and 
the row space of a matrix is the zero vector, since these subspaces are 
orthogonal subspaces (of R” for AT). Hence the vector Av = 0. 

So Av = 0. But the columns of A are linearly independent (4 
has full column rank), so Av = 0 has only the trivial solution v = 0. 
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Therefore, 4' Av = 0 has only the trivial solution and the matrix A‘ A 
is invertible. 


The columns of the 3 x 2 matrix M are linearly independent, since 
they are not scalar multiples of one another. The 2 x 2 matrix MTM is 


1 —2 
1 3 1 11 -1 
Tay _ — 
wum( 2 DE rJ- F) 
1 1 
which is symmetric and invertible since |MTM]| = 54 Æ 0. 


Exercise 6.14 From the given information, you can determine that 
k = 3, since the rows of B (written as vectors) must be in R*. You 
cannot determine m, but you can say that m > 2 because you know that 
B has rank 2, since its row space is a plane. 

Can you determine the null space of B? Yes, because the row 
space and the null space are orthogonal subspaces of R3, or simply 
because you know that the null space consists of all vectors for which 
Bx = 0, so all vectors such that (r;,x) = 0 for each row, r; of B. 
Therefore, the null space must consist of all vectors on the line through 
the origin in the direction of the normal vector to the plane. So a 
basis of this space is given by n = (4, —5, 3)! and a general solution 
of Bx = Q is 


a 
II 
~ 
AEEY 
| 
Oo ey a 
Ne 
~ 
M 
wv 


Exercise 6.15 The subspace W has a basis consisting of the three 
sequences, 


yı = {1,0,0,0,0...}, y2 = {0,1,0,0,0...}, 
ys = {0,0,1,0,0...}, 


so it has dimension 3. 


Chapter 7 exercises 


Exercise 7.1 The matrix Ár and its reduced row echelon form are 


1 1 2 1 0 1 
ar=(1 0 1) (0 1 i): 
2 1 3 0 0 0 
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A basis for the null space is {(—1, —1, 1)"}, and a basis for the range is 


o 


There are other possible answers. To verify the rank—nullity theorem, 


rank(T) + nullity(T) = 2 + 1 = 3 = dim(R°). 


This linear transformation is not invertible, as A7! does not exist. 


Exercise 7.2 To sketch the effect of S on the unit square, mark off a unit 
square on a set of axes. Mark the unit vector in the x direction, e;, in 
one colour, and the unit vector in the y direction, e2, in another colour 
(or differentiate between them by single and double arrowheads). Now 
draw the vector images of these, S(e;) and S(e2), in the same colours, 
and complete the image of the unit square with these vectors as its two 
corresponding sides. Do the same for T. 

The linear transformation S is reflection in the line y = x. The 
transformation T is a rotation clockwise by an angle > radians (or a 
rotation anticlockwise by 3a ; 

ST means first do T and then do S, so this will place the unit 
square in the second quadrant with ST (e1) = (0, —1)' and ez back in 
its original position. 

TS means reflect and then rotate, after which the unit square will 
be in the fourth quadrant, with T S(e2) = (0, —1)' and e; back in its 
original place. 

Their matrices are 


0 1 0 1 -1 0 
Asr = AsAr = (4 es JA A 


1 0 
Ars = ArAs = ( ae 


and 


These matrices are not equal, Asr # Ars. The columns of Asr are 
S7T(e,;) and ST (e2), and these do match the sketch. Check the columns 
of TS. 


Exercise 7.3 Write the vectors v; as the columns of a matrix, 


1 —1 0 
r= (0 1 ') 
L -23 
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Since |Pg| = 2 4 0, the columns are linearly independent and hence 
form a basis of R?. Pe is the transition matrix from B coordinates to 
standard coordinates, v = Pg[v]z. Finding P} ' by the cofactor method, 
or otherwise, the B coordinates of u are 


fe SS 5 
[ue=Pzlu=-[{ 1 5 -1]{2})=] 4]. 
OMe eae. Whe Ae 2AP 


Hence u = 5v; + 4v? — 2v3. 
Using properties of linear transformations, 


S(u) = S(5v1 + 4v2 — 2v3) 
= 58S(v1) + 4S(v2) — 2S(v3) 
= 5e; + 4e. — 2e3 


(3) 


Since R(S) is spanned by {e, e2, e3}, R(S) = R? and N(S) = {0}. The 
linear transformation S is the inverse of the linear transformation T 
with T(e;) = vı, T(e2) = V2, T(e3) = vı, which has matrix Pg, so the 
matrix As is P3 !. 


Exercise 7.4 If we had such a linear transformation, T : R? —> R?, then 


wo-f() bor }-E() |e) 


so that a basis of N(T) is the vector (1, 1, 1)'. The rank—nullity theorem 
states that the dimension of the range plus the dimension of the null 
space is equal to the dimension of the domain, R*. We have nullity(7) = 
1 and rank(7) = 2 since R(T) = R?. That is, rank(T) + nullity(7) = 
2+1=3, so the theorem would be satisfied. Note that this does not 
guarantee the existence of T, but if it did not hold, then we would know 
for sure that such a T could not exist. 

Since T : R? — R?, we are looking for a 2 x 3 matrix A such that 
T(x) = Ax. Given that 7(e;) and 7(e2) are the standard basis vectors 
of R?, we know that the first two columns of A should be these two 
vectors. What about the third column? We can obtain this column from 
the basis of the null space since we already have the first two. If ¢1, ¢2, €3 
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denote the columns of A, then cı + c2 + c3 = 0, and therefore 


x 
1 0 =l x—-—Z 
ray=arx= (9 1 2) @ =i) 
Zz 
Exercise 7.5 Only the linear transformation T S is defined, with Ars = 
ArAs, 


mG a § -G4 

TS = get | ee aniz je 
0 1 1/2 1304 1 4 1 

Exercise 7.6 The linear transformation T is given by T(x) = Ax, where 


A is a3 x 4 matrix. The simplest way to answer the questions is to 
construct the matrix whose columns are the images of the standard 


basis vectors, T(e;), 
I 2 eS fe 
A= 0 1 1 | $ 
—] 2 -1 z 


In order to consider the two possibilities in parts (1) and (ii), row reduce 
this matrix, beginning with R3 + Ri, 


1 2 5 x 1 2 5 x 
a= (o 1 1 y at 1 1 y ) 
0 4 4 z4+x 0 0 0 z+x-4y 


(1) By the rank—nullity theorem, dim(R(7)) + dim( N (T)) = dimV, and 
since T : R4 + R?, n = 4. So for the dimensions of R(T) and N(T) 
to be equal, the subspaces must both have dimension 2. Looking at the 
reduced form of the matrix, we see that this will happen if 


x—4y+z=0. 


If the vector x satisfies this condition, then a basis of R(T) is given by 
the columns of A corresponding to the leading ones in the row echelon 
form, which will be the first two columns. So a basis of R(T) is {v1, V2}. 

You could also approach this question by first deducing from the 
rank—nullity theorem that dim(R(7)) = 2 as above, so R(T) is a plane 
in R?. Therefore, {v;, v2} is a basis, and the Cartesian equation of the 
plane is given by 


1 2 x 
0 1 ypl=x-4y+z=0. 
-l 2 z 


The components of the vector v3 satisfy this equation, and this is the 
condition that the components of x must satisfy. 
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(ii) If the linear transformation has dim(N(T)) = 1, then by the 
rank—nullity theorem, you know that dim(R(T)) = 3 (and therefore 
R(T) = R°), so the echelon form of the matrix A needs to have three 
leading ones. Therefore, the condition that the components of x must 
satisfy is 


x—4y+z #0. 


Now you can continue with row reducing the matrix A to obtain a basis 
for N(T). The row echelon form of A will have a leading one in the last 
column (first multiply the last row by 1/(x — 4y + z) to get this leading 
one, then continue to reduced echelon form), 


L 2 Se Se 1 2 50 
a= (o 1 1 +) — (0 1 1 o) 
0 0 0 1 0 0 0 1 
0 3 
1 1 


1 0 
— (o | ; 
0 0 0 1 
So a basis of ker(T) is given by the vector w = (—3, —1, 1, 0). 


Exercise 7.7 The easiest method to determine the required values of à 
is to evaluate the determinant of the matrix whose columns are these 
vectors, and then find for what values of à the determinant is zero: 


p 2 
3-1 0|=-4-4=0. 
52 Ma 


So you can conclude that the set of vectors is a basis for all values of à 
except à = —1. 

Therefore, each of the sets B and S is a basis of R*. There are two 
methods you can use to find the transition matrix P from S coordinates 
to B coordinates. One way is to write down the transition matrix Pg 
from B coordinates to standard, and the transition matrix Ps from S 
coordinates to standard, and then calculate P = Pz! Ps. Alternatively, 
you can use the fact that the columns of P are the B coordinates of the 
basis S vectors. Since the first two vectors of each basis are the same, 
we will do the latter. We have, 


1 0 
[vile = fol and [vz]; = Í l 
Ol, Ol, 


so it only remains to find [s]g. This can be done by Gaussian elimi- 
nation. To find constants a, b, c, such thats = avı + bv + cb = Ax, 
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we reduce the augmented matrix: 


1 1 22 1 0 0 —} 
(3 = o o) > (o 1 o =i). 
—5 1 1 3 00 1 2 


It follows that 


(You should carry out this row reduction and all the omitted calcu- 
lations.) Therefore, the transition matrix P from S coordinates to B 
coordinates is 


1 0 -3 
P = Qin blet = (0 1 =). 
0 0 
1 
2 
2ls 
1 0 — 


s\ fl 0 
pp = Pws = (« 1 =) 2 |=] . 
00 2 A 4], 


You can check this result by finding the standard coordinates of w 
from each of these. (You will find w = (7, 1, 3).) 


Then, if [w]s = 


Exercise 7.8 To show that each of the vectors in the sets S and B are 
in W, you just need to substitute the components of each vector into 
the equation of the plane and show that the equation is satisfied. For 
example, x — 2y + 3z = (2) — 2(1) + 3(0) = 0. Each set contains two 
linearly independent vectors (neither is a scalar multiple of the other), 
and you know that a plane is a two-dimensional subspace of R*. Two 
linearly independent vectors in a vector space of dimension 2 are a 
basis, so each of the sets S and B is a basis of W. 

Since x — 2y + 3z = (5) — 2(7) +3(3) = 0, ve W. Its coordi- 
nates in the basis S are easily found (because of the zeros and ones 
in the basis vectors), 


5 2 —3 
(7)=2(1) +4(0 | ey pis 
3 0 1 


= Ivs= 3 | 


S 
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To find the transition matrix M from B to S, you can use the fact that 
M = ({bis, [b2]s), where B = {b,, b2}. As we just saw for the vector 
v, because of the zeros and ones in the basis S, we must have, 


bls= |], and s= |7], 


Then the required transition matrix M is 


1 2 
u=(; 1) 


so [x]s = M[x]z forallx € W.The matrix M is much easier to calculate 
directly than its inverse matrix, which is why the question was posed 
this way. However, you need to use the matrix M~! to change from S 
coordinates to B coordinates. 


-1 2 -1 2 7 —1 
= _ _ 
wi=(Q 4). me=(Q Abhla] 
which is easily checked (by calculating v = —b, + 4b»). 


Exercise 7.9 The answer, using the notation in Section 7.4, is A;g,37) = 
Pg! APg, where T(x) = Ax for all x € R>. Now, 


and 
-1 2 2 1 
aaa o 
Prejt 2 1) =[-3 3° -3], 
-1 2 2 3 73 5 


where we have omitted the details of the calculation of this inverse. It 
then follows that 


1 3 
Aig.g] = Pg APg = | 0 1 


—2 —-1 


Exercise 7.10 C%(R) is not empty; for example, the zero function is in 
this set, and so is e*. By standard results of calculus, the sum of two 
differentiable functions is again differentiable; and a scalar multiple 
of a differentiable function is differentiable. Thus, if f, g € C™(R), 
then their sum f + g is also differentiable arbitrarily often, and the 
same is true of af for any a € R. So C™(R) is closed under scalar 
multiplication. 
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The function D is a mapping D : C®(R) > C™(R) since D(f) = 
f' where f’: x t f(x) and f' can also be differentiated arbitrarily 
often. To show D is a linear operator, you only need to show that it is a 
linear function. We have: 


D(f +8)=(f +8) = f +g =D(f)+ De) 


and 


Daf) = (af = af =aD(f). 


These are just the rules of differentiation which you encounter in cal- 
culus; that is, the derivative of the sum of two functions is the sum of 
the derivatives, and the derivative of a scalar multiple of a function is 
the scalar multiple of the derivative. 


Chapter 8 exercises 


Exercise 8.1 To diagonalise the matrix A, first find the characteristic 
equation and solve for the eigenvalues. 


4-i 5 


4-11 =| -1 -2-A 


|= -2-3=0-30+1)=0. 


The eigenvalues are à = 3 and à = —1. Next find a corresponding 
eigenvector for each eigenvalue: 


5 5 1 1 = 
Niet Hana Pa eee) 


1 5 1 5 -5 
reae ee), 


Then you can choose 


E 5 ~ fed) 0 L1 = 
PEG a then DSa aE and P AP =D. 


Check that your eigenvectors are correct by calculating AP, 


AP= € E ae =m 3v2) = PD. 


This checks that Av; = (—1)v, and Av = 3v;. You can also check 
your answer by finding P~! and calculating P~! AP: 


1 1 5 1 —15 —l 0 
| sees _ = 
2o T a )a(o 3)=2 
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Exercise 8.2 We’ll provide an answer to this question, but leave the 
calculations to you. If you have carried out the steps carefully and 
checked that 4P = PD, you should have a correct answer. 

The characteristic polynomial is —A? + 14A? — 48A, which is easily 
factorised as —A(A — 6)(A — 8). So the eigenvalues are 0, 6 , 8. Corre- 
sponding eigenvectors, respectively, are calculated to be non-zero scalar 


multiples of 
1 1 1 
| p l ) | ) ' | ) 
2 2 0 


We may therefore take 


1 1 1 
P= (= 2 ‘ D = diag(0, 6, 8), 
2 20 


and then P~!4AP = D. Your answer is correct as long as AP = PD 
and your eigenvectors are scalar multiples of the ones given (taken in 
any order as the columns of P, as long as D matches). 


Exercise 8.3 The matrix A has only one eigenvalue, à = 1. The corre- 
sponding eigenvectors are all the non-zero scalar multiples of (1, 0)', 
so there cannot be two linearly independent eigenvectors, and hence the 
matrix is not diagonalisable. 

The eigenvalues of the matrix B are 0 and 2. Since this matrix has 
distinct eigenvalues, it can be diagonalised. 


Exercise 8.4 If M is ann x n matrix and A is a real number such that 
Mv = dv for some non-zero vector v, then A is an eigenvalue of M 
with corresponding eigenvector v. 


Exercise 8.5 We will give this solution in some detail. We have 


e i aaa, 


which shows that v is an eigenvector of A corresponding to the eigen- 
value A; = —2. 

The fact that T(x) = Ax = x for some non-zero vector x, tells us 
that x is an eigenvector of A corresponding to the eigenvalue Az = 1. 
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To find the eigenvector x, we solve (A — J)x = 0, 


5 13 8 | | 1 2 -1l 
(2 4 -2 ) — | 13 =s | — (o 3 =) 
7 17 —10 7 17 —10 0 3 -3 


1 2 -l 1 0 1 
— (0 1 1) — (0 1 a). 
0 0 0 0 0 0 


So a suitable eigenvector is 


To check this, 


6 13 —8 —1 —1 
w= (2 s 2) (1 )=(1). 
7 17 —9 1 1 


To diagonalise A, we need to find the remaining eigenvalue and 
eigenvector. You can do this by finding the characteristic equation 
|A — àI | = 0 and solving for 4. But there is an easier way using | A]. 
Evaluating the determinant — say, by using the cofactor expansion by 
row | — you should find that |A| = —6. Since the determinant is the 
product of the eigenvalues, and since we already have 4; = —2 and 
A2 = 1, we can deduce that the third eigenvalue is 43 = 3. Then a 
corresponding eigenvector is obtained from solving (A — 3/)v = 0, 


3 13 8 1 0 —ż 
s-31= (2 2 z ) TN EA (o 1 1). 
PTF ae 00 0 


So we can let 
1 
V3 = (1) . 
2 


Then take P to be the matrix whose columns are these three eigen- 
vectors (in any order) and take D to be the diagonal matrix with the 
corresponding eigenvalues in corresponding columns. For example, if 


you let 
1 -1 1 —2 0 0 
- (0 1 i); then b= (0 1 o). 
1 1 2 0 0 3 
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We have already checked the first two eigenvectors, so we should now 
check the last one, either by calculating AP, or just multiplying, 


6 13 —8 1 3 
m= (2 s -2) (1) = (3) =a 
HA 9 2 6 


Then we know that P7!AP = D. 

The linear transformation T(x) = Ax can be described as: T is a 
stretch by a factor of three along the line x = tv3, t € R; a stretch by a 
factor of two and reversal of direction for any vector on the line x = tv; 
and T fixes every vector on the line x = tv2, t € R. 


Exercise 8.6 We have 


-1 1 2 1 2 1 
m= (= 2 s) E E E 
0 1 1 1 2 1 


so x is an eigenvector with corresponding eigenvalue à = 2. The char- 
acteristic polynomial of A is p(A) = —A7 + 24? + à — 2. Since A = 2 
is a root, we know that (A — 2) is a factor. Factorising, we obtain 


pO) = (A = 2A? +: 1) =A- 2)A— DAF 1), 


so the other eigenvalues are A = 1, —1. Corresponding eigenvectors 
are, respectively, (1,0, 1)! and (0, —2, 1)". We may therefore take 


1 1 0 
p=(0 1 =), D = diag(1, 2, —1). 
11 1 


Check that AP = ((1)v; 2v2 (—1)v3) = PD. 


Exercise 8.7 Expanding the characteristic equation 


—À 0 —2 


1 0 3—-A 


by the first row, you should find that (A — 2) is a common factor in 
the two terms, so we can keep things simple, factor this out and not 
have to grapple with a cubic polynomial. The matrix 4 does not have 
three distinct eigenvalues. The eigenvalues turn out to be A, = 2, with 
multiplicity 2,andA3 = 1. So we first check that we can find two linearly 
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independent eigenvectors for A; = 2. We solve (A — 2/)v = 0. 


—2 0 -2 1 0 1 
anata (1 0 | — (o 0 o). 
1 0 1 0 0 0 


Since the matrix (A — 2/) clearly has rank 1, it is also clear that 
dim(N(A — 2/)) = 2; the eigenspace has dimension 2. The solutions 


are 
0 —1 
=s (1) (0) sre 
0 1 
0 —1 
n=(i) v= (0): 
0 1 


An eigenvector for 43 = 1 isv3 = (—2, 1, 1)!. These three vectors form 
a linearly independent set. Therefore, we may take 


la 


Let 


ees Â 
p= (1 0 i D = diag(1, 2, 2). 
1 A. 0 


You should check your result by calculating 4P. 

The eigenspace for 4 = 2 is two-dimensional and has a basis con- 
sisting of (—1, 0, 1)' and (0, 1, 0)". It is a plane in R*. The eigenspace 
for à; = 1 is a line in R? with basis (—2, 1, 1)". 


Exercise 8.8 If 0 is an eigenvalue of A, then, by definition, there is an 
eigenvector x corresponding to eigenvalue 0. That means x 4 0 and that 
Ax = 0x = 0. So Ax = 0 has the non-trivial solution x. Conversely, if 
there is some non-trivial solution to Ax = 0, then we have a non- 
zero x with Ax = Ox, which means that 0 is an eigenvalue (and x a 
corresponding eigenvector). 


Exercise 8.9 Let v1, v2 be two eigenvectors of a matrix A corresponding 
to eigenvalues 4, and A» respectively, with A; Æ Az. Then vı, v2 are 
linearly independent. To show this, let 


aiv +a = 0 


be a linear combination which is equal to the zero vector. If we can show 
that this equation only has the trivial solution, aj = a) = 0, then the 
vectors are linearly independent. Multiply this equation through, first 
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by à2 and then by A. Since Av; = A; Vv; and Av? = à2v2, we obtain, 
A AoV, + doArv2 = 0 
and 
a, AV, + a2 ÁV = AAV) + a2à2V2 = Q. 


Now, subtracting the first equation from the last equation in the line 
above, we have 


ailà — Az)v; = 0. 


But vı Æ 0 since it is an eigenvector, and (A; — A2) #0 since A; Æ 
Az. Therefore, a; = 0. Returning to the equation avı + a2v2 = 0, we 
conclude that a2v2 = 0. But again, v2 Æ 0 since it is an eigenvector, so 
a = 0 and we are done. 

To use an inductive argument, we now assume that the statement is 
true for n — 1 eigenvectors and show that this implies it is true for n 
eigenvectors. In this way, n =2 => n=3 => n = 4 => -and 
so on. 

So assume the statement that ‘eigenvectors corresponding to n — 1 
different eigenvalues are linearly independent’ is true, and assume we 
have n eigenvectors, v;, corresponding to n different eigenvalues, À;. 
Let 


QV, + 2V2 + +++ + An—1Vn-1 + anYn = 0 


be a linear combination which is equal to the zero vector. Multiply this 
equation through, first by à„ and then by A. Since Av; = A;v; for each 
i, we have 


aj Anv1 + a2AnV2 a aa An—1AnVn-1 + AndnVn =0 
and 
ay AV, +a. AV2 + +++ + Gy_-1 AVn—-1 + G,AVn 
= AA, Vi + d2À2V2 +--+ + Ay—1An—1Vn—-1 + GnAnVn = 9. 


Subtracting the the first equation from the last equation in the lines 
above, we have 


ay(Ay = An)V1 F az(à2 a An)V2 She An—(An—1 a Àn)Vn—1 =0. 


But we have assumed that n — 1 eigenvectors corresponding to distinct 
eigenvalues are linearly independent, so all the coefficients are zero. 
Since, also, (A; — àn) Æ 0 fori = 1,...,n — 1, we can conclude that 
a, =a) =- - - = dn-1 = 0. This leaves us with a,v, = 0, from which 
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we conclude that a, = 0. Therefore, the n eigenvectors are linearly 
independent. 


Exercise 8.10 Since A can be diagonalised, we have P~!AP = D for 
some P, where D = diag(A,,...,A,), these entries being the eigenval- 
ues of A. It is given that all 4; > 0. We have A = PDP“. 

If B? = A, we must have 


D=P'AP=P'B’P =P 'BPP 'BP =(P''BPY. 
Therefore, let 
B = Pdiag(/A1, VAn, .... VAn) PO. 
Then reversing the above steps, 
B? =P diag(/A1, VA2, -. <, /An)P  Pdiag(/A4,./Aa) 005 VAn) P 
= P diag( Ji, VAZ, -y Vin) Po! 
= PDP™! = 4A, 


and we are done. 


Chapter 9 exercises 
Exercise 9.1 You diagonalised this matrix in Exercise 8.1. If 
-1 —5 -1 0 
Gay oF and Das, a 
then P~'AP = D. Therefore, the matrix A” is given by 


os ae 7 are OFE z] 


—t gr +53”) —5(-1)" + a 
= 4 (—1)" = 3 gn 5(-1)" pan 37 z 
Incidentally, since the matrix A contains only integer entries, any power 
of A will also contain only integer entries. Therefore, each of the entries 
in the expression for A” is an integer; in particular, 

(- 1)" a 3” 

4 

is an integer. (Try this for some number n, say n = 5.) 


Exercise 9.2 We solve this using matrix powers. We could, of course, 
use a change of variable instead. Notice that the system can be written 
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x =(1 aE where x, = (*") 
t+1 j 0 fa t yi š 


This is x;;; = Ax,, where A is the matrix whose powers we calculated 
in Example 9.4. The solution (using the result from Example 9.4) is 


as 


x, = A'xo 
1 /2(-1)' +425  —8(—1)' + 8(2) \ / 1000 
= ( alee 4—1% +225) ) ie) 
= ( —1000(—1) + a 
500(—1) + 500(2') 
That is, 
x, = —1000(—1) + 2000(2^), y, = 500(—1) + 500(2’). 


Exercise 9.3 The system of difference equations can be expressed as 
X;41 = AX,, where 


7 3 x; 
(1 6 ‘). «= (3). 
5 0 -= Z; 


You need to diagonalise A. Expanding the determinant by column two, 
7—À 0 —3 

1 6-2 5 

5 0 —1-i’ 
so the eigenvalues are A = 6, 4, 2. Next find an eigenvector for each 
eigenvalue. You should find that if, for example, you set 


1 0 3 40 0 
p=(- 1 | and o=(0 6 o). then PT!AP = D. 
1 0 5 00 2 


Then the solution to the system of difference equations is given by 
x, = P D' P~'xo, so it only remains to find P~! and then multiply the 
matrices. But before that, you should check that AP = P D so that you 
know that your eigenvalues and eigenvectors are correct! 


7 0 =3 1 0 3 
ar=(1 6 5 (= 1 =r) 
5 0 -l1 1 0 5 


4 0 6 
= (<1 6 -14 = (4v; 6v, 2v3). 


(Aoi = (6 —A)(A? — 6A + 8), 


4 0 10 


Comments on exercises 483 


Since the eigenvalues are distinct, you know that your eigenvectors 
are linearly independent (as long as you have chosen one eigenvector 
for each eigenvalue). Then P will be invertible, and using either row 
operations or the cofactor method, 


1/3 9 33 
pini(s 2 -2), 
2X eit 0 1 


Then x, = PD'P~'!xo, 
Xi 1 0 3 4 0 0 —4 
(= (- 1 a) [> 6! o) (=) 
” 1 0 5 o o xj \i 
1 0 3 \ /-4(4) 
= (- 1 =) (=) , 
1 0 5 12) 


Therefore, the required sequences are 


x, = —4(4') + 3(2') 

yı = 12(4') — 3(6') — 7(2') fort eZ, t > 0. 

zı = —4(4) + 5(2') 
Notice that you will get the same answer even if you used a different 
(correct) matrix P and corresponding matrix D. 

To check your result, first see that the initial conditions are satisfied. 

Substituting tf = 0 into the equations, we get xọ = —1, yọ = 2,29 = 1 
as required. Next look at xı. Using the original difference equations, 


7 0 -3 —1 —10 
u=4u= (1 6 EEE 
5 0 -l 1 —6 


From the solutions, 


xı = —4(4) + 322) = —10, yı = 12(4) — 3(6) — 7(2) = 16, 
zı = —4(4) + 5(2) = —6. 
Exercise 9.4 Eigenvectors are given, so there is no need to determine 


the characteristic polynomial to find the eigenvalues. Simply multiply 
A times the given eigenvectors in turn (or you can do all three at once 
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by multiplying AP). For example, 


na(i)-E)=C) 


so —3 is an eigenvalue and vı is a corresponding eigenvector. The 
other two are eigenvectors for eigenvalue 3. Since v2 and v3 are clearly 
linearly independent (neither being a scalar multiple of the other), if 


1 -3 -l 
1 1 0 


P~'AP = diag(—3, 3, 3) = D. 


then 


The system of difference equations is x;41 = Ax;. Letu; = (u;, v;, w)! 
be given by u; = P~!x;. Then the system is equivalent to u,,; = Du, 
which is 


Uti = —3uz, Vipi = 3v, W = 3w. 
This has solutions 
= (—3) = 3“ = 3f 
u, = (—3) uo, v, =3'vo, w, = 3 wo,. 


We have to find uo, v9, wo. Now uo = (uo, vo, Wo)! = P7 txo, and (as 
can be determined by the usual methods), 


1/3 1⁄3 1 
p= (ap —1/3 o). 
1/3 4/3 1 


1/3 1/3 1 1 2/3 
Zo = P™!Xo = (<1 —1/3 | (i) = (25) . 
1/33 4/3 1/ \0 5/3 


The solution x; is therefore 
1 -3 -l (2/3)(—3) 
x, = Pz, = (= 0 1 (—2/3)3' 
1 1 0 (5/3)3' 
| (2/3)(—3)' + (1/3)3' 


so 


—(2/3)(—3)' + (5/3)3" 
(2/3)(—3)' — (2/3)3" 


The term x5 = (2/3)(—3)> + (1/3)3° = 2(—81) + (81) = —81. 


Comments on exercises 485 


Exercise 9.5 This is a Markov process, as it consists ofa total population 
distributed into two states, and the matrix A satisfies the criteria to be 
a transition matrix: (1) the entries are positive and (2) the sum of the 
entries in each column is 1. 

Interpreting the system, each year 40% of those living by the sea 
move to the oasis (60% remain) and 20% of those living in the oasis 
move to the sea. 

To solve the system, we need to diagonalise the matrix A. First find 
the eigenvalues: 


0.6—-A 0.2 
04 O8-—A 

= 0.48 — 1.44 + 47 — 0.08 

=)? —1.44+40.4 

= (A —1)(A— 0.4) = 0, 


|A—Al|= 


so A = l and à = 0.4 are the eigenvalues. 
We find corresponding eigenvectors by solving (A — AJ)v = 0: 


ae. = (-04 0.2 1 =i mee 
ee A-1=(94 a2) v) =u=(,) 


0.2 0.2 11 >i 
hy = 0.4: A-04l = (44 04) 7 (o a =v=( f 


Then x, = PD!’ P~'!xpo. The initial distribution is xọ = (0.5, 0.5)', 
Ce Bate 0 He eae) 
y) \2 1 0 (0.4)'/3\-2 1 0.5 
=(3 1') (0 sy) (4) 
“N2 id 0 (0.4) \-4 
17/1 1 i /—1 
aa aa 


The expressions for x; and y; are 


E T A ga 
Bot E T 
As t > œ, x; > (1/3, 2/3)!. In terms of the original total population 
of 210 inhabitants, we multiply x, by 210, so the long-term population 
distribution is 70 inhabitants living by the sea and 140 inhabitants living 
in the oasis. 


Exercise 9.6 (a) The matrix B is a scalar multiple of 4, B = 10 A. 
Let à be an eigenvalue of B with corresponding eigenvector v, so 
that Bv = Av. Then substituting 104 for B, we have 104v = Av and 
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so 
À 
= —V. 
10 
Therefore, Æ and B have the same eigenvectors, v, and 4/10 is the 
corresponding eigenvalue of 4. 
The matrix A is the transition matrix of a Markov chain because: 


AV 


1. All the entries are non-negative (a;; > 0). 
2. The sum of the entries in each column is 1. 


Since A = 1 is an eigenvalue of a Markov chain, we can deduce that 
104 = 10 is an eigenvalue of B. 

(b) To find an eigenvector for à = 10, solve (B — 10/)x = 0 by reduc- 
ing B — 107: 


E 0 L Or ao 10 -1 
[° —8 >(= 2 2) = (01), 
3 6 —6 0 -2 1 00 0 


2 
So an eigenvector for à = 10 is vı = | 1 : 
2 
To find the other eigenvalues, we find the characteristic equation. 


Expanding the determinant by the first column, 
7-1 2 2 

0 2—ì 4 

3 6 4—À 
= (7 — A)? — 64. — 16) + 3(24 +4) = 0. 


IB—Al|= 


Factoring the quadratic, there is a common factor of à + 2 in the two 
terms, which can be factored out, avoiding a cubic equation. We have 
|B —AlI| = -—Q + DIA — 7) — 8) — 6] 
= —(A + 2)(A? — 154 + 50) 
= —(A + 2)(A — 10)(A — 5). 
So the eigenvalues are à = 10, 5, —2. 


We then find the corresponding eigenvectors. Solving (B — 5/)v = 
0, by reducing B — S/, 


2. «2 2 1 
(o —3 i) (a 
3 “Go =l 0 


PEETER 

O = 

or O O U m 
win 

WI 

Se 
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—7 
So an eigenvector for à = 5 is v7 = 4 . 


3 
For à = —2, we have 


922 122 100 
E+2n= (0 4 =o 1 i)—-(« 1 l 
366 000 000 


0 
So an eigenvector for à = —2 is v3 = | -1 ) : 
1 


2 -7 0 10 0 0 
at 4 -1] and p= (0 5 0). 
2 3 1 0 0 -—2 


then P-'BP = D. The eigenvectors and eigenvalues must be listed in 
corresponding columns. 
To check, 


B ee 2 -7 0 20 —35 0 
sr= (1 2 ‘ [ 4 c1) = (i0 20 2 | =D. 
3 64 2 3 1 20 15 -2 


Why are you being asked to check? So that you know you do have the 
correct eigenvalues and eigenvectors. This gives you an opportunity to 
look for and correct any minor mistakes you may have made. 

By part (a), the eigenvalues and corresponding eigenvectors of A 
are à = | with eigenvector v,, 4 = 0.5 with corresponding eigenvector 
v2 and à = —0.2 with corresponding eigenvector v3. 

(c) The long-term distribution of a Markov chain is given by the eigen- 
vector for A = 1. Therefore, the distribution is proportional to the entries 


of the vector vı: 
l 2 400 
-= (2) 1000 = (200) . 
INI 400 


That is, 400 will be employed full-time, 200 will be employed part- 
time and 400 will remain unemployed. So a total of 600 will be 
employed. 

Notice that you did not need to use the initial conditions and you did 
not need to find the solution to x; = AX;—; to answer this question. This 
would have been a perfectly acceptable method, but one which would 
take much more time. You only needed to know that since (0.5) > 0 
and (—0.2)' — 0 as £ > ov, the eigenvector corresponding to A = 1 
will give the long-term distribution. It must be a distribution vector; 


If 
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that is, the components of the column vector must sum to 1, which is 
why we needed the factor 1/5. When multiplied by the total population 
of 1,000, this gives the distribution of workers. 


Exercise 9.7 We can express the system of differential equations in 
matrix form as y’ = Ay, where 


4 5 y(t ) 
A= , = i 
( -1 3) i G 
This is the matrix you diagonalised in Exercise 8.1 (and used in Exer- 
cise 9.1). If 


ee f-b 0 L1 
P= a] and ee A then P AP =D. 
To solve y’ = Ay, you set y = Pz to define new functions z = 
(z(t), z2(t))'. Then y’ =(Pz) = Pz and Ay = A(Pz) = APz, 
so that 


y = Ay <> P7 = APz 4> 7 = P! APZ = Dz. 
The system z’ = Dz is uncoupled; the equations are 
z| = Z]. z) = 3z2 
with solutions 
zi(t)= zie", zaf) = z2(O)e™. 
To find z;(0), z2(0), we use z = P~'y. Since y;(0) = 2, y2(0) = 6, we 


have 


So the solution of the original system is 
—1 —5 8e' 
y=P2=(5 ea 


yi(t) = —8e™ + 10e” 
y(t) = 8e™ — 2e”. 


that is, 


To check the solution, we first note that the initial conditions are satisfied 
by substituting t = 0 into the equations, 


yO = -8(1)+10(1)=2 and y0) = 8(1) — 2(1) = 6. 
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Next check that you obtain the same values for y’(0) from both the 
original equations and the derivatives of the solutions. From the original 


system, 
Gola 2) (6) = (24) 
y(0)/  \-1 -2/ \6/ °° \-14/° 
Differentiating the solution functions: 


VIE) = 8e™ + 30e* l yi (0) = 38 
y(t) = —8e™ — 6e” y(0) = —14 ` 


Exercise 9.8 This system of differential equations can be expressed as 
y' = Ay, where 


—] 1 2 yı 
a= (= 2 s). v= (>) 
0 1 1 3 


You diagonalised this matrix in Exercise 8.6. Using this result, if 


1 1 0 1 0 0 
p=(0 1 2] and p=(0 2 0). 
1 1 1 0 0 -l 


then P~!AP = D. To find the general solution of the system of differ- 
ential equations, we define new functions z = (z;(t), z2(t), 23(t))' by 
setting y = Pz. Then substituting into y’ = Ay, we have 


y =(P2) = Pr = Ay = A(Pz) = APs 
and hence 
z = P"'APz= Dz. 


The general solution of z’ = Dz is 


Z1 ae!’ 
z | =| B |, 
Z3 ye! 


so the general solution of the original system is given by 


yı 1 1 0 ae’ 
y=|»x |=Pz=ļ|0 1 -2) (52); 
y3 1 1 1 ye! 
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that is, 
yi(t) = ae! + pe” 
y2(t) = pe” — 2ye” 
y3(t) = ae’ + Be” + pe 


for arbitrary constants a, 6, y € R. 


Exercise 9.9 The system of differential equations is y’ = Ay, where A 


is the matrix 
4 0 4 
A= (o 4 ‘ : 
4 4 8 


This is the matrix we used in the example of Section 9.2.5 (and which 
we diagonalised in Example 8.23), so we have already done most of 
the work for this solution. Using the same matrices P and D as we did 
there, we set y = Pz to define new functions z = (z;(t), z2(t), 23(t))!, 
and then find the solutions to z’ = Dz. We have 


zi 4 0 0 Z1 

z ļ=|0 0 0 Z2 

zh 0 0 12 Z3 
with solutions: 


z(t) =2z1(0)e", z(t) = z(0)e™ = z2(0), 23(¢) = z30)e"”. 


Since the initial conditions are essentially the same, z(0) = P~!y(0), 


NYO) 
aO 1/3 3 0\/6 3 
(20) =; (2 5 2} (2)=(—) 
nO? “Xt i BP 2 7 


and the solutions are given by 


yı =f —1 1 3e“ 
V3 0 1 27 \je™ 


that is, 


A 


y(t) = —3e" +2 + 7e!” 
y(t) = 3e” +24 7e!” 
y(t) = —2 + 14e!” . 


Notice that the eigenvalue à = 0 causes no problems here. You can 
check the result using y(0) and y'(0). 
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Exercise 9.10 To answer the first question, put A into reduced row 
echelon form, 


5 -8 —4 1 0 4 
an ( 3 5 5) >o (o 1 5). 
—1 2 2: 0 0 0 


Setting the non-leading variable z to be equal to t, a general solution of 
the system of equations Ax = 0 is 


—4t —4 
= a =r s) =m 7 
t 1 


So a basis of N(A) is {v3}. 
To show that v; is an eigenvector of A, 


5 —8 —4 2 2 
m=(3 s =s) (1) = (3) =» 
-1 2 2 0 0 


SO Vj is an eigenvector with corresponding eigenvalue A = 1. 
Next find all the eigenvectors of A which correspond to A = 1 by 
solving (A — J)v = 0: 


4 -8 —4 D aya] 
sie 5 —6 s) — (o 0 0). 
= 2 1 00 0 


with solution 


x 2s +t 2 1 
()=( S J= (i)e (o) =m S,t E€ 
Z t 0 1 


Therefore, the matrix A has an eigenvalue à = 1 of multiplicity 2 with 
two linearly independent eigenvectors. Since Ax = 0 has a non-trivial 
solution, we know that à = 0 is the third eigenvalue of A, with corre- 
sponding eigenvector v3. Then an invertible matrix P and a diagonal 
matrix D such that P~!AP = D are given by 


2 1 —4 1 0 0 
p=(1 0 3), p= (o 1 o). 
0 1 1 0 0 0 


You should now check that the eigenvectors are correct by showing 
AP = PD. 

Using this diagonalisation, A = PDP~', so that A” = P D” P7!. 
But D” = D forn > 1, since the entries on the diagonal are either 1 or 0. 


a 


ad 
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Therefore, 
A” = PD” P™! = PDP™! = 4A. 


A matrix with the property that 4” = A for all n > 1 is said to be 
idempotent (meaning the same for all powers). We shall see more about 
these in Chapter 12. 

The solution to the system of difference equations given by x;+; = 
Ax; is x, = A'Xo. So forall t > 1, this is just x, = Axo. Given the initial 
conditions, we have 


5 —8 —4 1 —7 
uzan | 5 —5 =] (i) =(=). t>l. 
-1 2 2 1 3 


Therefore, the sequences are: 


x: {l, -7,—-7,—-7,...}, yı: {1, -5, —5, -5,...}, 
zt {1.3535 35+}: 


Chapter 10 exercises 


Exercise 10.1 Property (iii) of the definition of inner product follows 
from the fact that 


m n 


(A, A) =$ doa? > 0 


i=l j=1 


is the sum of positive numbers and this sum equals 0 if and only if for 
every i and every j, ai; = 0, which means that A is the zero matrix, 
which in this vector space is the zero vector. Property (i) is easy to 
verify, as also is (ii). 


Exercise 10.2 We have: 
Ix +yl? + Ix- yl? = (x+y, x+y) + (x—-y,x—-y) 
= (x, x) +2(x,y) + (y, y) + (x, x) — 2(x, y) + (y, y) 


= 2(x, x) + 2(y, y) 
= 2x]? + 2Ilyl’. 


Exercise 10.3 The set W 4 Ø because 0 € W since (0, v) = 0. Sup- 
pose x,y € W anda, € R. Because x L v and y L v, we have (by 
definition) (x, v) = (y, v) = 0. Therefore, 


(ax + By, v) = a(x, v) + Bly, v) = a(0) + B(0) = 0, 
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and hence ax + By L v; that is, ax + By € W. Therefore, W is a sub- 
space. In fact, W is the set {x | (x, v) = 0}, which is the hyperplane 
through the origin with normal vector v. 

The proof that S+ is a subspace is similar. The vector 0 is in S+ 
since (0, v} = 0 for all v € S, so S+ is non-empty. If x, y € S+ and 
a, B € R, then x and y are each orthogonal to all the vectors in S; that 
is, if v € S, then (x, v) = (y, v) = 0. Therefore, 


(ax + By, v) = a(x, v) + Bly, v) = a(0) + 6(0) =0  forallve sS. 


Therefore, ax + By € S+, so S+ is a subspace of R”. The subspace 
SŁ is known as the orthogonal complement of S. 


Exercise 10.4 If P is an orthogonal matrix, then PTP = J. Using the 
fact that the product of two n x n determinants is the determinant of 
the product, we have 


det(P') det(P) = det( PTP) = det(Z) = 1. 


But det(P™) = det(P), so this becomes (det(P))” = 1, which implies 
that det(P) = +1. 


Exercise 10.5 To show this is an inner product, we need to show it 
satisfies each of the three properties in the definition. 

(i) You can show that (x, y) = (y, x) for all x, y € R? by letting x = 
(x1, x2)!, y = (1, y2)", multiplying out the matrix product x! Ay and 
using properties of real numbers. But there is an easier way. The given 
matrix A is symmetric, so AT = A. Since x! Ay isa 1 x 1 matrix, it is 
also symmetric. Therefore, 


(x, y) =x" Ay = (x" Ay)" = y'A'x =y"Ax = (y, x). 


(ii) We next show that (ax + By, z) = a(x, z) + Bly, z) forall x, y, Z € 
R? and all a, 8 € R. This follows from the rules of matrix algebra: 


(ax + By, z) = (ax + By)" Az = (ax" + By!) Az 
= ax! AZ + By! Az = a(x, Z) + Bly, Z). 


(iii) Finally, we need to show that (x, x) > 0 for all x € R?, and (x, x) = 
0 if and only if x = 0. If x = 0, then (x, x) = 0'40 = 0, so it just 
remains to show that (x, x) > 0 if x 40. 
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Assume x = (x1, x2) is any non-zero vector in R?. If x. = 0, then 
xı Æ 0 (otherwise, we would have the zero vector), and 


(x,x)=(x, 0) F l @ = 5x? > 0. 


If x2 Æ 0, then 


5. 2 X1 
(x, x) = (x1 (5 1) (Gi) = S? + trixa td 


2 
= x2 (5% pA a ) = x2(5 +4t +1), fort eR. 
X2 X2 
Now f(t) = 5t? +4t + 1 is a quadratic function whose graph is a 
parabola. To see if it crosses the x axis, we look for the solutions 
of 5t? +4t + 1 = 0, which are given by the quadratic formula: t = 
(—4 + /16 — 4(5)(1))/10. So there are no real solutions, therefore 
5t? + 4t + 1 is either always strictly positive or strictly negative, and if 
t = 1, for example, it is positive, so we can conclude that 


(x, x) = x3(5t?+4t+1)>0 forallx 40 e€ R’. 
2 


Therefore, this is an inner product on R?. 
(a) Using this inner product, for the given vectors v and w, 


(v,w) =(1 OG ta yee D(z) =- 


(b) The norm of v satisfies 


ense DZ a AT e 


so ||v|| = v 10. 
(c) You need to find the set of all vectors x = (x, y)! for which (v, x) = 
0; that is, 


(v,x) =(1 DG plea 3) (3) = 7x +3y=0. 


y 


Tx +3y=0)}. 


A basis of S+ is 


(d) A basis of S = Lin(v) is the vector v. Therefore, you need to express 
w as a linear combination of v and the basis vector n = (—3, 7)! of S+. 
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That is, find a, b such that 


—1 1 —3 
(2 )=2(1) +7) 
You can solve this directly by writing out the equations and solving 
them simultaneously, or by using an inverse matrix, 


G)-wl4 )G)-8G) 
by Th eal, WA ek Ss 
Check your answer: 
Ns df 3 /-3 
Gia a ee, 
(e) The linearly independent vectors v and n are orthogonal under this 


inner product, so all you need to do is to normalise them. You already 
know the length of v; the length of n is given by 


ial? = m= Dh i) )=(-1 oa) 


so ||n|| = v 10. Therefore, 


a a. 


form an orthonormal basis of R? with the given inner product. 


Exercise 10.6 To start with, 


u = vi/Ilvill = (1/V2)(1, 0, 1, 0)". 


Then we let 
1 1 0 
Syao tna | eee ee 
1 0 1 
Now check that w) L u; by calculating (w2, u1) = 0. Then 
0 
an W2 = 1 2 
Iwll v5 | 0 


1 
Next (and you should fill in the missing steps), 


W3 = V3 — (V3, U2)U2 — (V3, UU; =+: = 
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Now let w, = (—5, —1, 5,2)" (for ease of calculation) and check that 
w; is perpendicular to both u; and up. 
Normalising w3, we obtain 


wu = (—5, —1,5,2)!. 


1 

J55 
The required basis is {u,, u2, U3}. 
Exercise 10.7 Choose two linearly independent vectors in the plane as 
a basis. (This is most easily done by choosing one of the components to 
be 0, another to be equal to 1, say, and then solving the equation for the 
third. By choosing the zeros differently, you obtain linearly independent 
vectors.) For example, let 


(=) 


Then {v,, V2} is a basis of W. Now use Gram-Schmidt orthonormali- 
sation. Set u; = (2//5, 1//5, 0)" and 


-(3)-()-0) 


The vector w2 = (3, —6, —5)! is parallel to w. (This is a good 
time to check that w2 | u; and also that w2 € W.) Now set u, = 
(3//70, —6//70, —5//70). The set {u;, u2} is an orthonormal 
basis of W. 

To extend this to an orthonormal basis of R?, note that W is a 
plane with normal vector n = (1, —2, 3)', so n is perpendicular to 
every vector in W. If you set uz = (1/14, —2//14, 3/14)", then 
{u1, U2, U3} is an orthonormal basis of R? as required. 
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Chapter 11 exercises 


Exercise 11.1 A can be orthogonally diagonalised because it is sym- 
metric. The characteristic polynomial of A is 


1h 0 9 
0 2—ìÀ 0 
9 0 7—À 


|JA—Al| = 
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= (2 — A) — A) — A) — 81] 
= (2 —A(a? — 14a — 32) 
= (2 —A\(A — 16)(A + 2), 


where we have expanded the determinant using the middle row. So 
the eigenvalues are 2, 16, —2. An eigenvector for à = 2 is given by 
reducing the matrix A — 2/: 


5 0 9 10 0 
a-2n=(0 0 oJ fo 0 h: 
9 0 5 00 0 


This means x = z = 0. So we may take (0, 1,0)!. This already has 
length 1 so there is no need to normalise it. (Recall that we need 
three eigenvectors which are of length 1.) For à = —2, we find that an 
eigenvector is (—1, 0, 1)' (or some multiple of this). To normalise (that 
is, to make of length 1), we divide by its length, which is 2, obtaining 
(1//2)(—1, 0, 1)". For à = 16, we find a normalised eigenvector is 
(1//2)(1, 0, 1). It follows that if we let 

0 -1//2 1/V2 

P= ( 0 0 ; 
0 1/72 1/72 


then P is orthogonal and PTAP = D = diag(2, —2, 16). Check this! 


Exercise 11.2 To show that v, is an eigenvector of A, find Av. 


2 1 -2 1 3 
am=(1 2 2) (1) = (3) =. 
—2 2 -=l 0 0 


so vı is an eigenvector corresponding to A, = 3. 
For the eigenvectors, 


-1 1 -2 1 -1 2 
saa (1 —1 2) (0 0 o). 
—2 2 —4 0 0 0 


with solutions 


s— 2t 1 —2 
= ( KY COLORS s,te 
t 0 1 


Therefore, a basis of the eigenspace is {v1, Vo}. 
To orthogonally diagonalise this matrix, you need to make this 
basis into an orthonormal basis of the eigenspace, and you need to find 


p 
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another eigenvalue and corresponding eigenvalue. You have choices 
available to you as to how to do each one. 

You can find the remaining eigenvalue by finding the characteristic 
equation, |A — àI | = 0, and then find the corresponding eigenvector. 
Alternatively, you know that the eigenspace of A, = 3 is a plane in 
R3, and you can deduce the normal to this plane from the reduced 
row echelon form of A — 3/ to be the vector v3 = (1, —1, 2)', so this 
must be the third eigenvector. Then you can find the corresponding 
eigenvalue by 


2 1 -2 1 =) 
w=(1 2 2 \(=1)=(2) =» 
-2 2 -1 2 —6 


So A3 = —3. You still need to obtain an orthonormal basis for the 
eigenspace of A; = 3. Using Gram-Schmidt, set u; = (Fy T 0)", 
then 


1 


—2 =2 Va 
w=|{ 0 ]- 0], |+ 
DROID) 
HON ao = 
=| 0 |}+—~—] LJ]=] 1 ]. 
(o) v2 \ (i) 


Now check that w is indeed orthogonal to the other two eigenvectors. 
Taking the unit eigenvectors, you can set 


1 1 1 

V2 is he 3 0 0 
P=| zy “R| and D=[0 3 0 |. 
@ L 2 0 0 -3 


osl-$l- 


v3 V6 
Then PT = P`! (P is an orthogonal matrix) and 


PAP = PAP =D. 


Exercise 11.3 The matrix representing the quadratic form q(x, y, Z) is 


-4 3 1 
a=(3 =2 -2), 
ik =o) a 


The first two principal minors are 


ayı = —4 and le = = —-]. 


Comments on exercises 499 


The first is negative and the second negative. If the matrix (and the 
quadratic form) were positive definite, both should be positive. If it were 
negative definite, the first should be negative and the second positive. 
So it is neither. Since |4| = 10 Æ 0, the quadratic form is indefinite. 
For the second quadratic form, f(x, y, z) = x! Bx, the matrix is 


-4 1 3 
a=(1 2j 2); 
a ee 


For this matrix, the principal minors are 
—4 1 


Therefore, this quadratic form is negative definite. 


= |B| = —6. 


Exercise 11.4 We found an orthogonal matrix P for which D = 
diag(2, —2, 16). Changing the order of the columns to satisfy the 
condition A; > A2 > A3, let O = (uy, u2, Us) be the matrix 


1//2 0 —-1/V2 16 0 0 

H 0 1 0 | saa =(0 2 0). 
1/ v2 0 1/72 0 0 =l 

Then, if x = Qz, with x = (x, y, z)" and z = (X, Y, Z)', we have 
f(x,y,z) = x" Ax = Z" Dz = 16X? + 2Y? — 227°. 


Since A has both positive and negative eigenvalues, the quadratic form 
is indefinite. 

To find a = (a, b, c) such that f(a, b, c) = —8, look at the expres- 
sion for f(x, y,z) in the coordinates with respect to the basis B of 
eigenvectors of A, which are the columns of Q. The unit eigenvector 
u; = (—1/¥V2, 0, 1/2)" has B coordinates, [0, 0, 1]g and will there- 
fore give the value f(x, y,z) = —2Z? = —2. So to obtain the value 
f(x,y,z) = —8, we can take the vector [0, 0, 2], which in standard 
coordinates is 2u3. You can check by substituting these values into f 


that, indeed, f(—2//2, 0, 2//2) = —8. 


Exercise 11.5 Ife), e5,..., e, are the standard basis vectors in R”, then 
e; has 1 as its ith component and 0 elsewhere. Then, if A is positive 
definite, 


T 
e; Ae; = lji > 0, 


since A is positive definite. 
The converse of this statement, however, is far from true. There are 
many matrices with positive numbers on the main diagonal which are 
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not positive definite. For example, the matrix 


ORY 


has eigenvalues à = 3, —1, so it is indefinite. 


Exercise 11.6 The matrix BTB is a k x k symmetric matrix since 
(B'B)' = B'(B")' = BTB. To show itis a positive definite matrix, we 
need to show that x! B' Bx > 0 for any vector x € R”,andx'B' Bx = 0 
if and only if x = 0. We have, for all x € R”, 


x’ B' Bx = (Bx)'(Bx) = (Bx, Bx) = || Bx||’, 


which is positive for all Bx Æ 0; and ||Bx||* = 0 if and only if Bx = 0. 

Since rank(B) = k, the reduced row echelon form of B will have a 
leading one in every column, so the only solution of Bx = 0 is x = 0. 
Therefore, x'(B'B)x > 0 for all x 4 0, and x'(B' B)x = 0 if and only 
if x = 0. Hence the matrix BTB is positive definite. 

Ifann x n symmetric A is positive definite, then all of its eigenval- 
ues are positive, so 0 is not an eigenvalue of A. Therefore, the system 
of equations Ax = 0 has no non-trivial solution, and so Æ is invertible. 


Exercise 11.7 The matrix A is 


1 —2 -1 
s=(-2 5 3 ) 
-1 3 2 


To determine if A is positive definite, negative definite or indefinite, we 
consider the principal minors: 


1 -2 
(au)=1>0, |_, 5 |=1>0 
ACOs 9) 4 D4) 16 5) =O. 


Since a; = 1 > 0, the matrix A is not negative definite. Since | A| = 0, 
one of the eigenvalues of A is 0, so A is not positive definite. 
To determine if Æ is indefinite, we need to find the eigenvalues. 
Expanding the characteristic equation, 
1-2 2 1 
-2 5-d 3 
—1 3 2—ì 


= —A3 + 8A? — 3A = —A(A? — 8A + 3). 


|A—Al|= 


The roots are à = 0 and, using the quadratic formula, 


._ st vee 72 
aes a 
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So both roots of the quadratic equation are positive. Therefore, the 
matrix is not indefinite, it is positive semi-definite. Therefore, there is 
no point (a, b, c) for which f(a, b,c) < 0. 


Exercise 11.8 We have q(x, y) = x! Ax = 5x? — 6xy + 5y? = 2. This 
time we orthogonally diagonalise the matrix 


aE 


using Q = (Wj, W2), where 


Then 


since Aw, = 8w; and Aw, = 2w2. Then, if x = Qz, in the new coor- 
dinates, z = (X, Y)", the equation becomes 


x" 4x = z' Dz = 8X + 2Y? = 2, 


or 4X? + Y? = 1. To sketch this, we first need to find the positions 
of the new X and Y axes. This time we must rely only on the eigen- 
vectors. The new X axis is in the direction of the vector w,, and the 
new Y axis is in the direction of w2. (So this linear transformation 
is actually a reflection about a line through the origin.) We first draw 
these new axes, and then sketch the ellipse in the usual way. This time 
it intersects the X axis in X = +(1/2) and the Y axis in Y = l1, 
as shown below. Notice that this is exactly the same ellipse as in 
Example 11.41 (as it should be!); we merely used a different change of 
basis to sketch it. 
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Exercise 11.9 The matrix A is 


Its eigenvalues are 5 and 10. Since these are both positive, the quadratic 
form is positive definite. This also indicates that there is an orthogonal 
matrix P such that PTAP = D = diag(5, 10), and so that x‘ Ax = 10 
is an ellipse. 

The vectors vı = (—1, 2)! and v2 = (2, 1)! are eigenvectors of A 
corresponding to the eigenvalues 5 and 10, respectively. If 


2 al 
P=(# Z| and p=(4 ae 
vs VS 


then P'AP = D. Set x = Pz with x = (x, y)", z = (X, Y)". Then the 
curve x! Ax = 10 becomes 


x! Ax = z' Dx = 10X? + 5Y” = 10, 


which is an ellipse. The linear transformation defined by P is a rotation, 
but not by an angle which we recognise. Therefore, the images of the x 
and y axes under P are found by looking at the images of the standard 
basis vectors under the linear transformation defined by P. Thus, the 
direction of the positive X axis is given by v2 = (2, 1)" and the direction 
of the positive Y axis is given by the vector vı = (—1, 2)". We now 
sketch the ellipse in standard position on the X and Y axes. 


It intersects the X axis at X = +1 and the Y axis at Y = +V2. 
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Exercise 11.10 The vectors vı = (1, 1)! and vz = (—1, 1)! are eigen- 
vectors of A since 


Avı = ¢ a a = a = (a + b)vi 


wen (5 (T(G) eam 


So the eigenvalues are 4; = a + b and A, = a — b, respectively. 


and 


Exercise 11.11 To sketch the curve x? + y? + 6xy = 4 inthe xy-plane, 
we write 

x^ +y +6xy =x Ax with 4=(, ae 
and orthogonally diagonalise A. The eigenvalues of A are A; = 4 
and A, = —2, with corresponding eigenvectors vı = (1, 1)! and vz = 
(—1, 1)". Therefore, we can take 


1 1 

fq. ay 4 0 

p=(# 2), b= (6 al 
v2 v2 

so that P~!4P = P'AP = D and P defines a rotation anticlockwise 


by 2/4 radians. Then setting x = Pz, with z = (X, Y)', 
x? +y? + oxy = x Ax =Z Dz = 4X? — 2Y? = 4. 


So we need to sketch the hyperbola 2X? — Y? = 2 on the new X, Y 
axes. This intersects only the X axis at X = +1. However, sketching 
the hyperbola is more difficult than sketching an ellipse: we also need 
to sketch the asymptotes so we know its shape. The asymptotes are 
Y = +V/2X. You can find the equations of these asymptotes in the 
standard x, y coordinates using z = P'x to substitute for X and Y in 
the two equations, Y = J/2X and Y = —/2X, and then sketch the two 
lines in x, y coordinates. But we are only required to do a sketch, so you 
can reason as follows. The line Y = /2.X has a steeper slope than the 
line Y = X, and because we have rotated by 7 /4 radians anticlockwise, 
the line Y = X is just the (old) y axis. Using this idea, you can sketch 
the asymptotes. and then the hyperbola. 

Knowing the points of intersection with the old axes helps here. 
We have the points of intersection with the new X axis are X = +1. 
The points of intersection with the old x and y axes are given 
by setting, respectively, y = 0 and x =0 in the original equation, 
x? + y? + 6xy = 4. These are x = +2 and y = +2. Here is the sketch. 
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Chapter 12 exercises 


Exercise 12.1 Suppose dim(S) = r. (Let us assume that 0 < r < n, the 
cases r = 0 andr = n being easy to deal with separately.) The proof 
of Theorem 12.13 shows us that there is an orthonormal basis of R” of 
the form {e;, @2,...,€-,€-41,---,@,}, where {e;,...,e,} is a basis of 
S and {e,41,...,€,} is a basis of St. Make sure you understand why. 
So this means that dim(S+) = n — r = n — dim(S). 


Exercise 12.2 If z € S+, then, for all s € S, (z, s} = 0. Now, lets € S. 
Then, for any z € SŁ, we have (z, s) = 0. So, for all z € S+, we have 
(s, z) = 0. But this shows that s is orthogonal to every member of S+. 
That is, s € (S+)+. Hence, S C (S+)+. Now, by the previous exercise, 


dim((S*)+) = n — dim(S*) = n — (n — dim(S)) = dim(S). 


So S C (S+)+, and both are subspaces of the same dimension. Hence 
S= (S5. 


Exercise 12.3 The orthogonal projection of R* onto the subspace 
spanned by the vectors (1,0, 1,0)! and (1,2,1,2)! is the same as 
the orthogonal projection onto R(A), where A is the rank 2 matrix 


1 


1 
0 2 
ei 4 
0 2 


Comments on exercises 505 


The projection is therefore represented by the matrix 


© 
Nie 

© 
NI= 


P = A(ATA) ! A" = 


© 
NI= 

© 
|= 


(Check this!) That is, for x = (x, y, z, w)! € Rf, the orthogonal pro- 
jection of x onto the subspace is given by 


11 
= =z 
gt 5 


X > 


Exercise 12.4 You have already shown in Activity 12.24 that the only 
eigenvalues of an idempotent matrix are à = 1 and à = 0. Since A 
can be diagonalised, there is an invertible matrix P and a diago- 
nal matrix D = diag(1,...,1,0,...,0) such that P-!'AP = D. Let 


V1, V2,---,V, denote the column vectors of P with vı, ..., v; being 
eigenvectors for à = 1 and v;1,..., Va eigenvectors for à = 0. Then 
B = {v1, Vo,..., Vn} is a basis of R” and {v),..., v;} is a basis of the 


eigenspace, (1), for eigenvalue A = 1. 

If y € R(A), then y = Ax for some x € R”. Now, x can be writ- 
ten as a unique linear combination x = a1Vı + d2V2 +--+ + nVn. We 
therefore have 


y = Ax = Alai Vi + a2V2 +--+ + anVn) 
= a, AV; + a2 AV. +:--+a,AV, 
= AV + 42V2 + +++ + AV; 
since Av; =v; for j =1,---,i and Av; =Oforj =i+1,---,n. 


Therefore, y € E(1), so R(A) € E(1). Now let y € E(1). Then y € R” 
and Ay = y, soy € R(A). This completes the proof. 
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Exercise 12.5 You can show directly that 4? = A by multiplying the 
matrices. Alternatively, you can use the diagonalisation. An invertible 
matrix P and a diagonal matrix D such that P~'AP = D are 


2 1 —4 1 0 0 
p=(i 0 3), p= (0 1 o). 
0 1 1 0 0 0 


Then D? = D, and since A = PDP™!, 
A? = PDP PDP! = PDP! = PDP! = 4A, 


so Á is idempotent. 

Because A is idempotent, the linear transformation given by 
T(x) = Ax is idempotent, and is therefore a projection from R? onto 
the subspace U = R(T) parallel to the subspace W = N(T); that is, 
U = R(A) and W = N(A). It is not an orthogonal projection because 
A is not symmetric. 

The null space of A, namely N(A), is the same as the eigenspace 
corresponding to the eigenvalue à = 0. Since A is idempotent, by 
Exercise 12.4, R(A) is the eigenspace corresponding to the eigenvalue 
A=1. 


Exercise 12.6 In matrix form, we want the least squares solution to 
Az = b, where 


it 4 0 
1 0 7 1 
A= 4 aal ts 3 
1 2 9 


So a least squares solution is 


SE T —1 Th 1.8 
z= (4"ay lA" = (59) 


(We’ve omitted the calculations, but you can check this.) So a best-fit 
linear relationship is 


Y = 1.8 + 2.9X. 


Exercise 12.7 In matrix form, this is 4z = b, where 
1 0 0 
A= 


P 
N 
II 
AATE 
a YA 
x 
5 
II 
BBN W 


pai SS j 


1 
2 
3 


\o 
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Here, A has rank 3, and the theory above tells us that a least squares 


solution will be 
2.75 
z = (ATA)! Ab = (02s) . 


0.25 


(Details of the calculation are omitted.) So the best-fit model of this 
type is Y = 2.75 — 0.25X + 0.25X?. 


Chapter 13 exercises 


Exercise 13.1 To solve z4 = —4, write z = re’? and 
—4 = 4e” = 4eiatenn) 


Then z+ = (re’®)* = 4ei(™+2"™), so that r+ = 4 and 40 = m 4 2nz. 
Therefore, r = /2 and 6 = ae ar i, a will give the four com- 
plex roots. These are: 


l 1 1 
z= vIe =NI a ee 
: V2 V2 


, 1 1 
zy = V267 = —J/2_ +i vV2— = -l + i, 
? J 


Z3 = V2e!57/4 =-l-i = Zp, 
Z4 = af Ber =l-i= Z|. 
To factorise zf + 4 into a product of quadratic factors with real coeffi- 


cients, write the polynomial as a product of linear factors in conjugate 
pairs. For the first conjugate pair, 


(z —21)(z — 21) = 2" — 2Re(z))z +2127) =z? — 2z +2. 
In the same way, (z — z2)(z — Z2) = z? + 2z + 2, so that 
zí +4= (2? —2z42)(z* +2z +2). 


Exercise 13.2 The determinant of a complex matrix is calculated in the 
same way as for a real matrix. You should find that 


1 i 
Lepr =i 


al = | = —] - (i(i + 1) = -1 — i? -i=-i. 
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You can solve the system using row reduction, or by finding A~! (exactly 
as you do for a 2 x 2 real matrix). Then, x = A~'b. We have 


1 —1 —j 1 1 1 
le, he, 
ae oe i, Gee a 


so that 


Exercise 13.3 The set W; is not a subspace of M (IR) since 


(Yom we co) Vem 


But Wy is a subspace of M>(C). It is closed under addition and under 
scalar multiplication since any complex number w can be written as 
w = z* for some complex number z. 

Lin(S) denotes a subspace of (IR) consisting of all real linear 
combinations of the matrices, and Lin(.S) denotes a subspace of M2(C) 
consisting of the much larger set of all complex linear combinations of 
the matrices. These are very different sets; however, the basis will be 
the same in each. 

To find a basis, we need to eliminate any ‘vectors’ which are linear 
combinations of the other vectors in the spanning set. Note that 


Bee ata et M ae ee ae 

0 2 QO aye SE NO I Boe 
so that a basis (a linearly independent spanning set) of Lin(S$) consists 
of the first two matrices; that is, 


ilo 2)(o af 


Hence Lin(S) is two-dimensional, considered either as a subspace of 
M)(R) or of M2(C). 


Exercise 13.4 The eigenvalues of a complex matrix are calculated in 
the same way as for a real matrix. The characteristic equation is 
1-A i 


O ae |e er) ee ee 
Lpr -1—-2 =À l—i—il=À i=0. 


|JA—Al|= 


So the eigenvalues are the solutions of the equation å? = i. To solve 
this, write à = re’’. Then one solution is obtained by setting 


a2 — (rel? — rei? =i= eiT, 
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Equating the moduli and arguments, we obtain r = 1 and 0 = 77/4, so 
that 


Mi =e" = cos (Z) +i sin (Z) = : +i : ; 
4 4 YD PNI 


The other eigenvalue can be obtained by realising that à) = —A, (or 
using another expression for i, suchas A* = (re!®)? =i = e>”/). The 
1 1 
other eigenvalue is A. = —A); = i ; 
j ane eS 


Exercise 13.5 The eigenvalues of A are 2 + i. If 


Ler tag 2+i 0 
P=(°3 i) ang D=(*4 pa): 


then P-!AP = D. 


Exercise 13.6 (a) To show that v is an eigenvector, calculate Av, 


-E O-O 


Hence v is an eigenvector with corresponding eigenvalue à = —2. 
(b) Solve (A — AJ)v = 0 for à = 4 + 2i to see if it is an eigenvalue and 
to find a corresponding eigenvector. 


1—2 5 Zs 
a-an=] a. or “5 


4 0 —6— 2i 
1 0 -3-4i 
~ (0 5 ip). 
0 0 0 


Hence x = (3 +i, 1 +i, 2)! is an eigenvector corresponding to the 
eigenvalue à = 4 + 2i. 

(c) Since the matrix A is real, complex eigenvalues appear in con- 
jugate pairs, so that 4 — 2i is also an eigenvalue with corresponding 
eigenvector X = (3 — i, 1 — i, 2)'. If 


3+i 3-i 0 442i 0 0 
p= (14 l—i : and o=( 0 4—2i 0). 
2 2 1 0 0 —2 


then P-'AP = D. 
A cannot be unitarily diagonalised. To show this, either show that 
A is not normal (show A* A 4 AA”), or show that it is not possible to 
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form an orthonormal basis of eigenvectors of A. For example, (x, v) = 
wx=34+i1 40. 


Exercise 13.7 Calculate (v1, v2) to show that the vectors are orthogonal. 
Then check that each vector has unit length. 

Extend this to an orthonormal basis of C? by taking any vector not 
in Lin(S) and using Gram-Schmidt. For example, use (1, 0, 0)! and 


set 
1 1 1 1 1 
0 0 0 0 
1 142i 1 142i 
0 2+ 2i 24+2i 
to obtain 
2 
1 
—3 +i 
Then, if 


2 
V3 = — = SE ry 
the vectors {v1, V2, V3} are an orthonormal basis of C?. (You should 
check that v3 is orthogonal to the other two vectors.) 


Exercise 13.8 The answers to these questions are contained in the text. 


Exercise 13.9 Let à be an eigenvalue of A and let x be an eigenvector 
corresponding to 4, so that Ax = Ax. As Ais unitary, AA* = A*A = l, 
so that x* A* Ax = x*/x = x*x. Also, 


x* A* Ax = (Ax)*(Ax) = (Ax)*(Ax) = A*AX*x = |A|?x"x. 


Equating the two expressions for x* A* Ax, we have x*x = |A|°x*x, from 
which we obtain, 


(1 — |A|?)x*x = 0. 


As x is an eigenvector, x*x = ||x||? Æ 0, so we can conclude that |A|? = 
1 and so |A| = 1 (since the modulus of a complex number is a real 
non-negative number). 
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Exercise 13.10 The eigenvalues of the real symmetric matrix A are 
2, —2, 16. You diagonalised this matrix in Exercise | 1.1. An orthogonal 
matrix P and a diagonal matrix D which orthogonally diagonalise A 
are 


so that P'AP = D. 
Then A = 2F, — 2E, + 16E3, where 


0 0 0 1 1 0 =] 
s= {0 1 o). n=3( 0 0 0). 
0 0 0 —1 0 I 


TSE 
n=; (0 0 o). 
2\1 01 


A quick calculation should convince you that these matrices have the 
required properties. 

Since the matrices £1, E2, E3 have these properties, then for any 
real numbers a, @2, @3, and any positive integer n, we can conclude 
that (x E1 + &2E2 + 0343)" = a} E1 + a3 E2 + a3 E3. In particular, if 
B = œE; + @2E2 + @3E3 and B? = A, then to find a matrix B as 
required, we use 


B= 2'7E, + (-2)'P E, +167 Es, 


which (after simplification) gives us 


1 0 3 
par(o 2 o). 
3 0 1 


You should check all this. 


Exercise 13.11 (a) The proof that A*A is Hermitian is straight- 
forward: (A*A)* = A*(A*)* = A*A, so A*A is Hermitian. Every 
Hermitian matrix is normal, so A* A is normal. (This can also be proved 
directly.) 

(b) We have 


v*(A* A)v = (Av)*(Av) = (Av, Av) = 0 


and v*(A* A)v = 0 only if Av = 0 by properties of the inner product. 
Since A has full column rank, Av = 0 has only the trivial solution 
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v = 0. So v*(A* A)v > 0 for all v € C”, v Æ 0, which shows that A* A 
is positive definite. 

Similarly, if A* Av = 0, then multiplying both sides of this equality 
on the left by v* we can conclude that Av = 0, and hence that v = 0. 
That is, A* Av = 0 has only the trivial solution v = 0, which implies 
that A* A is invertible. 


Exercise 13.12 The matrix B is not Hermitian, but you can easily 
check that it is normal and can therefore be unitarily diagonalised. 
To diagonalise, we see immediately from the matrix B that A = 1 is 
an eigenvalue with corresponding eigenvector x; = (0, 0, 1)'. Solving 
the characteristic equation, you should find that the other two eigen- 
values are à = 1 +i and A3 = 1 — i with corresponding eigenvec- 
tors x2 = (1, 1, 0)' and x; = (1, —1, 0)', respectively. Note that these 
eigenvectors are mutually orthogonal. Normalising, we have, 


1 (i 1 0 ) 
P = — |1 -1 0 | = (uuu) 
v2\o 0 v2 


l+i 0 0 
D=( 0 l—i o). 
0 0 1 


Then B = (1+ /)£, + (1 —i)E2 + E3, where E; = u;u¥, i = 1,2,3. 
We find 


and 


Therefore, 


110 ED -10 00 0 
1 i= 

pa Hài 1 o)4 (=i 1 o)+(0 0 0). 
2 \o 00 2 \o 0 0 001 
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adjoint, 114 
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and geometric multiplicity, 269 
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angle between vectors, 30 
argument, 394 


associated homogeneous system, 78 


augmented matrix, 63 
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change of, 225 
definition, 181 
extending, 190 
of column space, 194 
of null space, 193 
of row space, 192 
standard, 177 


Cauchy—Schwarz inequality, 315 
change of basis, 225, 230, 261 
as transformation, 226 
change of coordinates, 223—229 
change of variable 
difference equations, 286 
differential equations, 299 
characteristic equation, 248 
characteristic polynomial 
definition, 248 
of similar matrices, 262 
coefficient matrix, 61 
coefficients, 59 
cofactors, 99 
and inverse, 116 
expansion, 100 
matrix of, 114 
collinear, 57 
column space, 140, 161, 191 
column vector, 23 
complex conjugate, 390 


complex conjugate of matrix, 399 
complex inner product, 401—404 
definition, 403 
complex inner product space, 403 
complex matrices, 399—420 
complex matrix, 399 
column space, 399 
eigenvalue, 399 
eigenvector, 399 
null space, 399 
range, 399 
complex number 
imaginary part, 390 
real part, 390 
complex numbers, 389-397 
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algebra of, 390-391 
argument, 394 
DeMoivre’s theorem, 395 
division, 391 
Euler’s theorem, 396 
exponential form, 396 
modulus, 394 
multiplication, 390 
polar form, 394 
principal argument, 394 
viewed geometrically, 393 
complex plane, 393 
complex vector space, 398-399 


composition of transformations, 217 


conic sections, 232, 351-355 
conjugate pairs, 392 
consistent linear system, 69 
consumption matrix, 120 
coordinates, 185—186 

change of, 225 
coplanar, 127 
Cramer’s rule, 117 
cross product, see vector product 


definiteness 
and eigenavalues, 342 
and principal minors, 345, 346 
test, 345, 346 


demand vector, 121 
DeMoivre’s theorem, 395 
determinant, 18, 98—113 
2x 2,18 
as sum of signed products, 103 
effect of row operations, | 10 
of a product, 112 
through cofactors, 99 
through row operations, 106 
diagonal of a matrix, 11 
diagonalisable 
characterisation, 271 
diagonalisable matrix, 256 
diagonalisation 
and change of basis, 261 
and eigenvectors, 257—260 
and matrix powers, 280 
definition, 256 
geometrical view, 260-262 
orthogonal, 329 
unitary, 412 
difference equations, 282 
system, 282—290 
differential equations, 296—303 
linear system, 297 
dimension, 186—191 
definition, 188 
dimension theorem 
matrices, 198 
transformations, 222 
direct sum, 365 


and orthogonal complement, 368 


distribution vector, 292 
dot product, 25 


echelon form, 66 

eigenspace, 252 

eigenvalues, 247—256 
and definiteness, 342 
and determinant, 253 
and trace, 254 
definition, 247 
distinct, 265 
finding, 247-252 
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eigenvalues (cont.) 

of complex matrix, 399 

of Hermitian matrix, 409 

of similar matrices, 263 

of symmetric matrix, 332 
eigenvectors, 247-256 

definition, 247 

finding, 247—252 

of complex matrix, 399 

of similar matrices, 263 

of symmetric matrix, 332 
elementary matrix 

definition, 92 
elementary product, 103 
elementary row operations, 64 
ellipse, 232 
equivalence relation, 94 
Euler’s formula, 396 
exponential form of complex number, 

396 


full column rank, 372 
functions 

pointwise addition, 150 

pointwise scalar multiplication, 150 
Fundamental Theorem of Algebra, 391 


Gauss-Jordan elimination, 64 
Gaussian elimination, 64—78, 133-139 
leading variables, 72 
non-leading variables, 72 
geometric multiplicity, 269-272 
and algebraic multiplicity, 269 
definition, 269 
geometry of vectors, 27—33 
Gram-Schmidt process, 321-323 


Hermitian conjugate, 407 
Hermitian matrix, 408 
has real eigenvalues, 409 
orthogonal eigenvectors, 410 
homogeneous system, 75 
associated, 78 
homogeneous systems 
are consistent, 75 
hyperplane, 47, 164, 188 


idempotent, 374 
identity matrix, 15 
identity transformation, 216 
imaginary number, see complex 
numbers 
imaginary part 
complex number, 390 
indefinite quadratic form, 341 
initial condition, 282, 284, 297 
inner product, 24, 312-314 
and angle, 30, 316 
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and length, 30, 315 
and norm, 315, 403 
and orthogonality, 32, 404 
and unitary matrix, 41 1 
Cauchy—Schwarz inequality, 315 
complex, 403 
definition, 313 
Euclidean, 312 
geometrical interpretation, 30-32 
properties, 25 
standard, 24, 312 
inner product space, 313 
complex, 403 
input—output analysis, 119 
inverse 
of linear transformation, 218 
inverse matrix, 16-20, 94-98, 
113-118 
2x 2,18 
by cofactors, 116 
definition, 17 
is unique, 17 
inverse of product, 20 
invertible matrix, 18 
isometry, 355 


kernel 
of linear transformation, 220 
of matrix, 78 


leading variables, 72 
least squares solution, 380-383 
length of a vector, 29 
Leontief input-output analysis, 119 
linear combination, 24, 140, 153, 160 
closure under, 158 
non-trivial, 172 
uniqueness, 175 
linear dependence, 172-181 
linear equations, see linear system 
linear function, see linear transformation 
linear independence, 172—181 
test, 175-179 
linear mapping, 210 
linear operator, 210 
linear span, 160 
linear system, 59-78 
augmented matrix for, 63 
coefficient matrix, 61 
coefficients, 59 
consistent, 69 
general solution, 73 
geometrical interpretation, 69 
homogeneous, 75 
infinitely many solutions, 73 
of difference equations, 282—290 
of differential equations, 297 
particular solution, 73 


Principle of Linearity, 79 
rank, 135-139 
row operations, 64 
solution of, 60 
solution set, 74 
vector form of solution, 73 
vector solution, 138 
linear transformation, 210—223 
and matrices, 212-216, 220 
change of basis, 230 
composition, 217 
corresponding to matrix, 212 
definition, 210 
dimension theorem, 222 
examples, 211—212 
idempotent, 374 
identity, 216 
inverse, 218 
kernel, 220 
matrix of, 213, 220, 230 
null space, 220 
range, 220 
rank, 221 
rank—nullity theorem, 222 
rotation, 215 
zero, 216 
lines, 33-39, 162, 188 
Cartesian equation, 33, 37 
coincident, 37 
coplanar, 38 
skew, 38 
vector equation, 33—36 
long-term distribution, 293 


Markov chain, 290-296 
distribution vector, 292 
long-term distribution, 293 
regular, 294 
state vector, 292 
transition matrix, 292 

Markov process, 291 

matrix, 10 
addition, 11 

additive identity, 15 
associative, 14 
commutative, 14 
algebra, 14-16 
associativity, 14 
distributivity, 15 
characteristic polynomial, 248 
cofactors, 99 
column space, 161, 191 
complex conjugate, 399 
corresponding transformation, 212 
definition of, 10 
determinant, see determinant 
diagonalisable, 256 
dimension theorem, 198 


eigenvalues of, 247 
eigenvectors of, 247 
elementary, 92 
entry, 10 
equality, 11 
Hermitian, 408 
Hermitian conjugate, 407 
idempotent, 374 
identity, 15 
ill-conditioned, 89 
inverse, 16—20, 94—98, 113-118 
2% 2,18 
definition, 17 
is unique, 17 
of product, 20 
invertible, 18 
kernel of, 78 
minor, 99 
multiplication, 12—14 
associative, 14 
mutiplicative identity, 15 
non-commutative, 13 
multiplication by a scalar, 11 
non-invertible, 18 
non-singular, 18 
normal, 412 
null space, 78, 158, 191, 220 
nullity, 195 
of cofactors, 114 
of linear transformation, 213, 220 
of quadratic form, 340 
of rotation, 215 
orthogonal, 319 
range, 139, 220 
rank, 132, 195 
rank—nullity theorem, 198 
row equivalence, 94 
row space, 161, 191 
singular, 18 
size, 10 
skew symmetric, 56 
skew-Hermitian, 429 
square, | 1 
stochastic, 295 
symmetric, 22 
trace, 254 
transpose, 20 
definition, 21 
properties, 21—22 
unitary, 410 
zero, 15 
matrix diagonal, | | 
matrix inverse 
of product, 20 
matrix power, 279 
matrix powers, 20 
and spectral decomposition, 418 
via diagonalisation, 280 
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minor, 99 
principal, 344 
modulus, 394 
multiplication of matrices, 12—14 
multiplicity 
algebraic, 269 
geometric, 269 


negative definite, 341 
negative semi-definite, 341 
non-diagonalisable matrices, 264 
non-invertible matrix, 18 
non-leading variables, 72 
non-singular matrix, 18 
non-trivial linear combination, 172 
norm 

definition, 315, 403 
normal matrix, 412 
normal vector, 41 
normalisation, 315 
null space, 191 

matrix, 158 

of linear transformation, 220 

of matrix, 78, 220 

orthogonal complement, 370 
nullity 

of matrix, 195 


orthogonal 
vectors, 32, 317, 404 
orthogonal complement, 367—372 
and direct sum, 368 
definition, 367 
is a subspace, 367 
of null space, 370 
or range, 370 
orthogonal diagonalisation, 329 
of symmetric matrices, 329-339 
Spectral theorem, 331 
orthogonal matrix 
definition, 319 
has orthonormal columns, 321 
orthogonal projection, 374 
idempotent and symmetric, 376 
onto range, 377 
orthogonality, 3 16-320 
and inner product, 32, 404 
and linear independence, 318 
orthonormal basis, 320 
orthonormal set 
and orthogonal matrix, 321 
and unitary matrix, 410 
definition, 320, 404 
orthonormalisation, 321—323 


permutation, 102 
inversion, 102 
planes, 39—46, 162, 188 
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affine subsets, 163 
as subspaces, 162 
Cartesian equation, 41—46, 184 
normal vector to, 41 
parametric equation, 39 
vector equation, 39 
polar form of complex number, 394 
portfolio, 88 
arbitrage, 89 
riskless, 88 
positive definite, 341 
positive semi-definite, 341 
power of a matrix, 20 
powers of a matrix, 279-281 
and difference equations, 284-286 
definition, 279 
principal argument, 394 
principal minor, 344 
and definiteness, 345, 346 
Principle of Linearity, 79 
production vector, 119 
projection, 372-378 
definition, 372 
idempotent, 375 
orthogonal, 374 
Pythagoras’ theorem 
generalised, 317 


quadratic form 
definition, 340 
indefinite, 341 
negative definite, 341 
negative semi-definite, 341 
positive definite, 341 
positive semi-definite, 341 
quadratic forms, 339-355 


range 
and column space, 162 
of linear transformation, 220 
of matrix, 139, 220 
orthogonal complement, 370 
rank 
and linear system, 135-139 
full column, 372 
of matrix, 132, 195 
of transformation, 221 
rank-nullity theorem 
matrices, 198 
transformations, 222 
real part 
complex number, 390 
RREF, 67 
reduced row echelon form, 67 
regular Markov chain, 294 
rigid motion, 355 
rotation, 215, 353 
row echelon form, 66 
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row equivalence, 94 

row operations, 62—64 
and determinants, | 10 

row space, 161, 191 

row vector, 23 


scalar, 11 
scalar product, 25 
scalar triple product, 127 
signed elementary product, 103 
similarity, 231-235, 256 
and characteristic polynomial, 262 
and eigenvalues, 263 
and eigenvectors, 263 
simultaneous equations, 60 
singular matrix, 18 
size of a matrix, 10 
skew symmetric matrix, 56 
solution set of linear system, 74 
spectral decomposition, 415—420 
and matrix powers, 418 
Spectral theorem, 331 
proof, 337-339 
square matrix, | | 
standard basis, 177, 182 
state price, 89 
state vector, 292 
stochastic matrix, 295 
subspace 
definition, 154 
direct sum, 365 
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examples, 155-157 

planes, 162 

spanned by a set, 160 

sum, 364 

test for, 157 
sum of subspaces, 364 
symmetric matrix, 22 

and quadratic form, 340 

eigenvalues, 332 

eigenvectors, 332 

orthogonal diagonalisation, 329-339 
system 

of equations, see linear system 


technology matrix, 120 
test of definiteness, 345, 346 
trace, 254 
transition matrix 
of basis, 225 
or Markov chain, 292 
transpose, 20 
definition, 21 
properties, 21—22 
triangle inequality, 317 
trivial solution, 75 


unit vector, 30 
unitary diagonalisation, 412 
unitary matrix, 410 

and inner product, 411 

has orthonormal columns, 410 


vector, 23 
addition, 23 
column, 23 
components of, 23 
definition, 23 
direction of, 30 
dot product, 25 
entries of, 23 
geometrical interpretation, 27—33 
inner product, 24 
length of, 29 
linear combination, 24 
orthogonal, 32, 317, 404 
parallel, 30 
perpendicular, 32 
row, 23 
scalar multiplication, 23 
scalar product, 25 
zero, 24 

vector product, 127 

vector space 
axioms, 150 
definition, 150 
examples of, 151 
finite-dimensional, 188 
infinite-dimensional, 188 
subspace, 154 


zero matrix, 15 
zero transformation, 216 
zero vector, 24 


